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Abstract 



We consider linear regression in the high-dimensional regime in which the number of obser- 
vations n is smaller than the number of parameters p. A very successful approach in this setting 
uses £i-penalized least squares (a.k.a. the Lasso) to search for a subset of sq < n parameters 
that best explain the data, while setting the other parameters to zero. A considerable amount of 
work has been devoted to characterizing the estimation and model selection problems within this 
approach. 

In this paper wc consider instead the fundamental, but far less understood, question of statis- 
tical significance. 

We study this problem under the random design model in which the rows of the design matrix 
are i.i.d. and drawn from a high-dimensional Gaussian distribution. This situation arises, for 
instance, in learning high-dimensional Gaussian graphical models. Leveraging on an asymptotic 
distributional characterization of regularized least squares estimators, we develop a procedure 
for computing p-values and hence assessing statistical significance for hypothesis testing. We 
characterize the statistical power of this procedure, and evaluate it on synthetic and real data, 
comparing it with earlier proposals. Finally, we provide an upper bound on the minimax power 
of tests with a given significance level and show that our proposed procedure achieves this bound 
in case of design matrices with i.i.d. Gaussian entries. 

1 Introduction 

The Gaussian random design model for linear regression is defined as follows. We are given n i.i.d. 
pairs {yi,xi), {y2, X2), ■ ■ ■ iyn,Xn) with 2/i G M and Xi G W, xi ~ N(0, S) for some covariance matrix 
E ;^ 0. Further, yi is a linear function of Xj, plus noise 



Here G M^' is a vector of parameters to be learned and ( ■ , ■ ) is the standard scalar product. The 

special case S = Ipxp is usually referred to as 'standard' Gaussian design model. 

In matrix form, letting y = [yi, . . . , ynY and denoting by X the matrix with rows xj,- . . , we 
have 



Vi = (6*0, Xi) + Wi 



Wi ~ N(0,cr^). 



(1) 
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We are interested in high dimensional settings where the number of parameters exceeds the sample 
size, i.e., p > n, but the number of non-zero entries of Oq is smaller than p. In this situation, a 
recurring problem is to select the non-zero entries of 9q that hence can provide a succinct explanation 
of the data. The vast literature on this topic is briefly overviewed in Section 

In statistical applications, it is unrealistic to assume that the set of nonzero entries of^o can 
be determined with absolute certainty. The present paper focuses on the problem of quantifying 
the uncertainty associated to the entries of ^o- More specifically, we are interested in testing null- 
hypotheses of the form: 

Ho,i ■ do,i = 0, (3) 

for i G Ip] = {1, 2, . . . ,p} and assigning p- values for these tests. Rejecting ffo,j is equivalent to stating 
that 6'o,i / 0. 

Any hypothesis testing procedure faces two types of errors: false positives or type I errors (in- 
correctly rejecting -ffo,j) while 0o,i = 0), and false negatives or type II errors (failing to reject Ho^i, 
while ^o,i / 0). The probabilities of these two types of errors will be denoted, respectively, by a and 



/3 (see Section 2.1 for a more precise definition). The quantity 1 — /3 is also referred as the power of 
the test, and a its significance level. It is trivial to achieve a arbitrarily small if we allow for (3 = 1 
(never reject -ffo.i) or /3 arbitrarily small if we allow for a = 1 (always reject Hq^i). This paper aims 
at optimizing the trade-off between power 1 — /? and significance a. 

Without further assumptions on the problem structure, the trade-off is trivial and no non-trivial 
lower bound on 1 — /3 can be established. Indeed we can take 0o,i 7^ arbitrarily close to 0, thus 
making i/o,i in practice indistinguishable from its complement. We will therefore assume that, 
whenever 0o,i 7^ 0, we have |0o,i| > A* as well. The smallest value of ^ such that the power and 
significance reach some fixed non-trivial value (e.g., a = 0.05 and 1 — /3 > 0.9) has a particularly 
compelling interpretation, and provides an answer to the following question: 

What is the minimum magnitude of ^o,j to be able to distinguish it from the noise level, 
with a given degree of confidence? 

Recently Zhang and Zhang |ZZ11] and Biihlmann |Buhl2| proposed hypothesis testing procedures 
for design matrices X satisfying the restricted eigenvalue property |BRT09j . When specialized 
to the case of standard Gaussian designs Xi ~ N(0,lpxp); these methods require |0o,i| > A* = 
c maxjcrso logp/ n^a/y/n} to reject hypothesis Hq^i with a given degree of confidence, where c is 
a constant independent of the problem dimensions (see Appendix |A]) . 

In this paper we prove that a significantly stronger test can be constructed, at least in an asymp- 
totic sense, for some Gaussian designs. Indeed we show that |0o,i| ^ ca/y/n is sufficient for Hq^i 
to be rejected. This is somewhat surprising. Even \i n = p and the measurement directions are 
orthogonal, e.g.j^X = ^/nlnxn) we would need |0o,i| ^ ca/y/n to distinguish the i-th entry from 
noise. 

As in jZZlH IBiihl2| , our approach is based on the Lasso estimator |Tib96j 

%, X) = arg min {i-||y - X^f + A . (4) 



'^The factor y^n in X = y'nlnxn is required for a a fair comparison with the standard Gaussian design, where 
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with high probabiUty and hence the signal-to-noise ratio is p/o 
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Unlike |ZZ1H IBuhl2] . the precise test, and its analysis are based on an exact asymptotic dis- 
tributional characterization of high-dimensional estimators of the type Q. This characterization 
was proved in |BMlll IBM12] for standard Gaussian designs, S = Ipxp- We further generalize these 
results to a broad class of covariance S, and general regularized least squares estimators, by using 
the non-rigorous replica method from statistical mechanics [MM09j . 

The contributions of this paper are organized as follows: 

Upper bound on the minimax power. In Section [2] we introduce the problem formally, by tak- 
ing a minimax point of view. We prove a general upper bound on the minimax power of tests 
with a given significance level a. We then specialize this bound to the case of standard Gaussian 
design matrices, showing formally that no test can detect 6*0,1 7^ unless |0o,j| ^ /"UB = co j ^/n. 

Hypothesis testing procedure for standard Gaussian designs. Building on the results of [BM12j . 



we describe in Section 3.1 a test that is well suited for the case of standard Gaussian designs, 
S = Ipxp- We prove that this test achieves a 'nearly-optimal' power-significance trade-off in 
a properly defined asymptotic sense. Here 'nearly optimal' means that the trade-off has the 
same form, except that /^UB is replaced by /i = C^uB with C a universal constant. 

Generalization to nonstandard Gaussian designs. For S 7^ Ipxpi no rigorous characterization 
analogous to the one of |BM12j is available. Using the non-rigorous replica method, we derive 
a conjecture for a broad class of covariance S and general regularized least squares estimators, 
that we will call the standard distributional limit (see Sections 3.2 and [4]). Assuming that the 



standard distributional limit holds, we develop in Section |3.2| a hypothesis testing procedure 
for this more general case, that we refer to as SDL-test. 



Validation. We validate our approach on both synthetic and real data in Sections 3.1.1, 3.2.1 and 
Section [sj comparing it with the method of |Biihl2j . Simulations suggest that the latter is 
indeed overly conservative, resulting in suboptimal statistical power. 

Proofs are deferred to Section HI 

This paper focuses on the asymptotic regime introduced in |Don061 IDT05|, IDT091 IDTlOj and 
studied in jDHMOOi IDMMIH [BM12] . The advantage of this approach is that the asymptotic char- 
acterization of [ BM12| is sharp and appears to be accurate already at moderate sizes. 

A forthcoming paper |JM13j will address the same questions in a non-asymptotic setting. 



1.1 Further related work 

As mentioned above, regularized least squares estimators were the object of intense theoretical inves- 
tigation over the last few years. The focus of this work has been so far on establishing order optimal 
guarantees on: (1) The prediction error ||X(0 — ^o)||2 ^GR04J : (2) The estimation error, typically 
quantified through ||^- OoWg, with q G [1,2] |CT07l iBRTOQl IKWY09| : (3) The model selection (or 
support recovery) properties, e.g., by bounding P{supp(6') 7^ supp(0o)} |MB061 IZY061 IWaiOQI . For 
establishing estimation and support recovery guarantees, it is necessary to make specific assumptions 
on the design matrix X, such as the restricted eigenvalue property of [BRT09J or the compatibil- 
ity condition of |vdGB09] . Both |ZZ11] and |Biihl2] assume conditions of this type for developing 
hypothesis testing procedures. 
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As mentioned above, our guarantees assume the Gaussian random design model. This was 
fruitfuUy studied in the context of standard Hnear regression [HKZllj . as well as sparse recovery. 
Donoho and Tanner |Donn6[ IDTn5[ IDTn9[ IDTIO] studied the noiseless case a = 0, for standard 
Gaussian designs E = Ipxp, and reconstruction using basis pursuit, i.e., the A — )• limit of the Lasso 
estimator Q. They considered the asymptotic behavior as so,p,n — )• oo with so/p — )■ e G (0,1) 
and n/p — )■ (5 G (0, 1). They proved that, depending on the values of {e,5), the unknown vector 
00 is either recovered exactly with probability converging to one or not recovered with probability 
converging to one, and characterized the boundary between these regimes. 

Wainwright [Wai09j considered the Gaussian design model and established upper and lower 
thresholds nuB{p, so]T,), ni^Bip, so;T,) for correct recovery of supp(0o) in noise o" > 0, under an 
additional condition on /x = minjgsupp(9o) l^o,i|- Namely, for n,p,so — )• oo with n > n\jB{p, so;T,), 
P{supp(0) = supp(^o)} — ^ 1, while forn,p. So — )■ oowithn < ni^^ip, so',^),^{supTp{9) = supp(0o)} — ^ 
0. For the special case of standard Gaussian designs, both nLB(p, so; E = I) and nuB(p, so; S = I) are 
asymptotically equivalent to 2solog(p), hence determining the threshold location. More generally 
n\jB{p, so;T,) = O(sologp) for many covariance structures S, provided ^ = r2(y^logp/n). Correct 
support recovery depends, in a crucial way, on the irrepresentability condition of |ZY06| . 

In the regime n = Q{sq logp) that is relevant for exact support recovery, both type I and type II 
error rates tend to rapidly as n,p, sq — >• oo. This makes it difficult to study the trade-off between 
statistical significance and power, and the optimality of testing procedures. Further, the techniques 
of |Wai09| (which are built on the results of |ZY06j ) do not allow to estimate type I and type II error 
rates a and (3 but only their sum as a + /3 < p~^ for c > depending on the level of regularization 
and the scaling of various dimensions. 

Here we are interested in triples n,p, sq for which a and f3 stay bounded away from and from 
the trivial baseline a + /3 = 1. As shown in Section [2j any hypothesis testing method that achieves 
this requires = r2(n~^/^) and n > sq. Since further we build on the Lasso estimator Q, we need 
n = 0(so log(p/so)) by the results of [DMMlH IBM12j . In other words, the regime of interest for 
standard Gaussian designs is ciSq log(p/so) < n < C2Solog(p). At the lower end the number of 
observations n is so small that essentially nothing can be inferred about supp(0o) using optimally 
tuned Lasso estimator, and therefore a nontrivial power 1 — /3 > a cannot be achieved. At the upper 
end, the number of samples is sufficient enough to recover supp(0o) with high probability, leading 
to arbitrary small errors a, 13. We consider the asymptotic scaling sq/p ^ e & (0, 1) which is indeed 
the asymptotic scaling covered by |BM12| . and previously studied in |Don061 IDT051 IDT09t IDT10| 
and [ DMM091lDMMllj . In Section [sj we apply the results of |BM12] to this asymptotic regime, and 
develop a test that achieves non-trivial power, /3 < 1 — a, provided n > 2 (1 + 0{so/p))solog{p/so). 
We indeed show that our proposed testing procedure achieves nearly optimal power-significance 
trade-off. 

Several papers study the estimation error under Gaussian design models, including |RWY10| 
ICRlll ICPIO) . The last one, in particular, considers a much more general setting than the Gaussian 
one, but assumes again n = ^l{so logp). 

As mentioned above, the asymptotic characterization of |BM12j only applies to standard Gaus- 
sian designs. A similar distributional limit is not available for general covariance matrices S 7^ Ipxp- 
In Section 3.2 we conjecture such a limit on the basis of non-rigorous statistical physics calcula- 
tions presented in Appendix [Bj Assuming that this conjecture holds, we derive a corresponding 
hypothesis testing procedure. It is worth mentioning that the replica method from statistical physics 
is already used by several groups for analyzing sparse regression problems, in particular by Ran- 
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gan, Fletcher, Goyal [RFG09) . Kabashima, Tanaka and Takeda jKWTOQL iTKlO] . Guo, Baron and 
Shamai |GBS09) . Wu and Verdii [WVllj . Earlier work applying the same method to the analysis 
of large CDMA systems includes |Tan02l IGV05j (whose results were -in part- rigorously confirmed 
in |MT06t IGWOSj ) . This line of work is largely aimed at deriving asymptotically exact expressions 
for the risk of specific estimators, e.g., the Lasso or the Bayes optimal (minimum MSE) estimator. 
Most of the previous work in this line is limited to standard Gaussian setting. Exceptions include 
[TKini IT(]SV11I IKMV12j but they are hmited either to the noiseless setting [TKlOl IKMV12j or to 
other matrix ensembles ITCSVllj IKMV12] . To the best of our knowledge, the present paper is the 
first that applies the same techniques to high-dimensional hypothesis testing. Further, we consider 
a broader setting than the standard Gaussian one. 

Let us finally mention that resampling methods provide an alternative path to assess statistical 
significance. A general framework to implement this idea is provided by the stability selection method 
of [MBlOj . However, specializing the approach and analysis of [MBlOj to the present context does 
not provide guarantees superior to \ZZ11\ IBiihl2] , that are more directly comparable to the present 
work. 

1.2 Notations 

We provide a brief summary of the notations used throughout the paper. For an n x p matrix X, 
Xs' denotes the n x IS"! matrix with columns indices in S. Likewise, for a vector 6 G W^, 6$ is the 
restriction of 9 to indices in S. We denote the rows of the design matrix X by xi, . . . , Xn G W . We 
also denote its columns by xi, . . . ,Xp G M". The support of a vector G is denoted by supp(0), 
i.e., supp(^) = {i G \p\-,9i / 0}. We use I to denote the identity matrix in any dimension, and Idxd 
whenever is useful to specify the dimension d. Throughout, <I>(x) = J^^e~^ /^dt/-v/27r is the CDF 
of the standard normal distribution. 

2 Minimax formulation 

2.1 Tests with guaranteed power 

We consider the minimax criterion to measure the quality of a testing procedure. In order to define 
it formally, we first need to establish some notations. 

A testing procedure for the family of hypotheses H^^i, cf. Eq. ([3]), is given by a family of 
measurable functions 



Here Ti^x.{y) = 1 has the interpretation that hypothesis -ffo,j is rejected when the observation is 
y G and the design matrix is X. We will hereafter drop the subscript X whenever clear from the 
context. 

As mentioned above, we will measure the quality of a test T in terms of its significance level a 
(probability of type I errors) and power 1 — /3 (/3 is the probability of type H errors). A type I error 
(false rejection of the null) leads one to conclude that a relationship between the response vector y 
and a column of the design matrix X exists when in reality it does not. On the other hand, a type 
H error (the failure to reject a false null hypothesis) leads one to miss an existing relationship. 



Ti : 



M" X M"^P ^ {0, 1} . 
(y,X)^T,,x(y). 



(5) 
(6) 
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Adopting a minimax point of view, we require that these metrics are achieved uniformly over 
So-sparse vectors. Formally, for /i > 0, we let 

a,(T) = sup{Pe(T.,x(2/) = l) : ^ e ||0||o < so, = o} , (7) 

ft(r;/x) = sup{Pe(T^,x(2/) = 0) : ^ € ||^||o < so, |^^| > /u} . (8) 

In words, for any so-sparse vector with 0j = 0, the probability of false alarm is upper bounded by a. 
On the other hand, if Q is so-sparse with > /i, the probability of misdetection is upper bounded 
by /3. Note that IP9(-) is the induced probability distribution on (y, X) for random design X and noise 
realization given the fixed parameter vector Q. Throughout we will accept randomized testing 
procedures as welQ 

Definition 2.1. The minimax power /or testing hypothesis Hq^i against the alternative \6i\ > n is 
given by the function 1 — ■ ', fJ-) ■ [0, 1] — )■ [0, 1] where, for a £ [0, 1] 

l-/3°P*(a;^) = sup{l-/3i(r;/x) : ai{T) < a}. (9) 

Note that for standard Gaussian designs (and more generally for designs with exchangeable 
columns), ai(T), f3i(T; fi) do not depend on the index i £ \p]. We shall therefore omit the subscript 
i in this case. 

Remark 2.2. The optimal power a i— )■ 1 — /3°''*(q;; /x) is non- decreasing. Further, by using a test 
such that rj^x(y) = 1 with probability a independently of y, X, we conclude that 1 — /3i{a; fi) > a. 

2.2 Upper bound on the minimax power 

In this section we develop an upper bound for the minimax power 1 — /3°''*(a; ^) under the Gaussian 
random design model. Our basic tool is a simple reduction to the binary hypothesis testing problem. 

Definition 2.3. Let Qq be a probability distribution on W supported on = {9 £ W : \\0\\o < 
sq, 6i = 0}, and Qi a probability distribution supported on 0,i = {9 £ W : \\9\\o < sq, \9i\ > fi}. For 
fixed design matrix X £ M"^^, and z £ {0, 1}, let Pg^z^x denote the law of y as per model ^ when 
9q is chosen randomly with 9q ^ Q^. 

We denote by 1 — /3]^^(q; Q) the optimal power for the binary hypothesis testing problem 9q ~ Qq 
versus 9q ^ Qi, namely: 

A- ^(«; Q) ^ inf { Pq,i,x(T.,x(2/) = 0) : PQ,o,x(r.,x(2/) = 1) < « } • (10) 

The reduction is stated in the next lemma. 

Lemma 2.4. Let Qq, Qi be any two probability measures supported, respectively, on VLq and Vli as 
per Definition 2.3. Then, the minimax power for testing hypothesis Hq^i under the random design 
model, cf. Definition 2A_ is bounded as 

P^^\a; m) > inf {E/3];^(ax; Q) : lE(ax) < a } . (11) 

■^Formally, this corresponds to assuming Ti{y) — Ti{y;U) with U uniform in [0,1] and independent of the other 
random variables. 
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Here expectation is taken with respect to the law of X and the inf is over all measurable functions 
X I—)- ax- 



For the proof we refer to Section 6.1 



The binary hypothesis testing problem is addressed in the next lemma by reducing it to a simple 
regression problem. For 5" C [p], we denote by P5 the orthogonal projector on the linear space 
spanned by the columns {xi}i^s- We also let P;^ = Inxn — P5 be the projector on the orthogonal 
subspace. Further, for a E [0, 1] and u S M+, define the function G{a,u) as follows. 



G{a,u) = 2 - ^>(^>-Hl - |) +n) - «>(«>-i(l - ^) - uj . 



(12) 



In Fig. [T| the values of G{a, u) are plotted versus a for several values of u. 
Lemma 2.5. Let X € M^^^ and i £ [p]. For S C [p] \ {i}, a £ [0, 1], define 



1 



yooracle 

1"^ 2,5C 



{a;S, fi) 



Gia, 



a 



If \S\ < So then for any ^ > there exists distributions Qo, Qi as per Definition 2.3. depending on 
i, S, n but not on X, such that I3^%{a- Q) > (3°^£^%a; S, /i) - C 

The proof of this Lemma is presented in Section |6.2[ 

Using Lemma 2.4 and 2.5, we obtain the following upper bound on the optimal power of random 
Gaussian designs. 
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Theorem 2.6. For i £ [p], let 1 — /3°^*(a;/i) be the minimax power of a Gaussian random design 
X with covariance matrix S G RP^p, as per Definition 
Sj^5'S^^S5^j E M. Then, for any £ € M anc? jSI < sq, 



2.1 



For 5 C [p] \ {i}, define Sji^ = S, 



1 



/3r(«;^) <G'(a, 



a 



+ i^„_so+i(n-so + . 



where Fk{x) = P(Zfc > x), and Zk is a chi-squared random variable with k degrees of freedom. 



In other words, the statistical power is upper bounded by the one of testing the mean of a scalar 
Gaussian random variable, with effective noise variance a'^g ~ a'^ /[T,^s(n — sq)]. (Note indeed that by 
concentration of a chi-squared random variable around their mean, £ can be taken small as compared 
to n 



So-) The proof of this statement is to be found in Section 6.3 



(The 



The next corollary specializes the above result to the case of standard Gaussian designs, 
proof is immediate and hence we omit it.) 

Corollary 2.7. For i €z [p], let 1 — /3°''*(a; n) be the minimax power of a standard Gaussian design 
X with covariance matrix S = Ipxp, cf. Definition 2.1, Then, for any ^ G [0, (3/2)\/n — sq + 1] we 
have 



1-/3-^ (a;^) < G a, 

v a 



It is instructive to look at the last result from a slightly different point of view. Given a G (0, 1) 
and 1 — /3 G (a, 1), how big the entry \x needs to be so that 1 — ff^^[a.\ /x) > 1 — /3? It is easy to 
check that, for any a > 0, n i— )■ G{a,u) is continuous and monotone increasing with G{a,0) = a 
and lim„_!.oo G(a, ti) = 1. It follows therefore from Corollary 2.7 that any pair (a, /3) as above can 
be achieved if ^ > /^UB = ca j ^/n for some c = c(a, /3). Previous work \LTA\\ IBuhl2| is tailored to 



deterministic designs X and requires fi > c max{(TSo logp/ n^aj^fn] to achieve the same goal (see 
Appendix IAI) . 



On the other hand, the upper bounds in Lemma 2.5 and Theorem 2.6 are obtained by assuming 
that the testing procedure knows supp(0)/\{i}, and this might be very optimistic. Surprisingly, 
these bounds turn to be tight, at least in an asymptotic sense, as demonstrated in the next Section. 



3 The hypothesis testing procedure 
3.1 Standard Gaussian designs 

The authors of [BM12| characterize the high-dimensional behavior of the Lasso estimator for se- 
quences of design matrices of increasing dimensions, with independent standard Gaussian entries. 
We build upon this result and propose a hypothesis testing procedure for testing null hypothesis -ffo,i- 
We analyze it in the asymptotic setting and show that it achieves nearly optimal power-significance 
trade-off. 

For given dimension p, an instance of the standard Gaussian design model is defined by the tuple 
(00, n, a), where ^ n £ N, a £ M+. We consider sequences of instances indexed by the problem 
dimension { (6*0 (p) , n (p) , a (p) ) }peN • 
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Definition 3.1. The sequence of instances {{Oo{p),n{p),a{p))}p£fq indexed by p is said to be a 
converging sequence if n{p)/p — )■ (5 G (0,oo), a{p)'^/n — )■ a^, and the empirical distribution of the 
entries Oq{p) converges weakly to a probability measure pQ^ on R with bounded second moment. 
Further p~^ E.e^ O^Ap? ^ ^Pe^^D- 

Note that this definition assumes that the coefficients 0o,i are of order one, while the noise is 
scaled as cr^(p) = 6(n). Equivalently, we could have assumed 0o,i = 0(l/\/n) and C7^(p) = 0(1), 
since the two settings only differ by a scaling of y. We favor the first scaling as it simplifies somewhat 
the notation in the following. 

It is useful to recall the following result established in |BM12) . 

Proposition 3.2. ( WMl^ ) Let {{9(){p),n{p), a{p))}p£f^ be a converging sequence of instances of the 
standard Gaussian design model. Denote by 6 = 9{y, X, A) the Lasso estimator given as per Eq. 
and define e RP, r € M" by letting 

r = ^XT(y-X^), r=^(y-X^), (13) 
n \/n 



with d = (1 — ||0||o/?T-)^^- Then, with probability one, the empirical distribution o/ {(^o,j) ^r)}i'=i 
converges weakly to the probability distribution of [Qq, ©o + tZ), for some r € M, where Z ~ N(0, 1), 
and Go ~ POo is independent of Z . Furthermore, with probability one, the empirical distribution of 
{rjj^^j^ converges weakly to N(0,t^). 

In other words, 0^ is an unbiased estimator of 6q, and that its distribution is asymptotically nor- 
mal. Roughly speaking, the regression model ^ is asymptotically equivalent to a simpler sequence 
model 

^" = 6*0 + noise (14) 

with noise having zero mean. Further, the construction of 6^ has an appealing geometric interpreta- 
tion. Notice that 9 is necessarily biased towards small li norm. The minimizer in Eq. (j4|^must satisfy 
(1/n) X"''(y — X0) = \g, with g a subgradient of ii norm at 0. Hence, we can rewrite 6^ = 9 + dXg. 
The bias is eliminated by modifying the estimator in the direction of increasing ii norm. See Fig. [2] 
for an illustration. 

Based on Proposition 3.2 we develop a hypothesis testing p roce dure as described in Table [T} 



The definitions of d and r in step 2 are motivated by Proposition 3.2 Recall that d(y — X0)/-y/n is 
asymptotically normal with variance r^. In step 2, r is estimated from data using median absolute 
deviation (MAD) estimator. This is a well-known estimator in robust statistic and is more resilient 
to outliers than the sample variance [HR09] . 

Finally, under the null hypothesis Ho^i, the quantity 9f/{T) is asymptotically N(0, 1). The defi- 
nition of (two-sided) p-values Pi in step 4 follows. In the final step, the assigned p-value Pi is used 
to test the hypothesis -ffo.i- 

As before, we will measure the quality of the proposed test in terms of its significance level (size) 
a and power 1 — /3. Recall that a and /3 respectively indicate the type I error (false positive) and 
type II error (false negative) rates. The following theorem establishes that the Pj's are indeed valid 
p-values, i.e., allow to control type I errors. Throughout Sq{p) = {i £ \p\ : 9o^i{p) / 0} is the support 
of^o(p). 
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Figure 2: Geometric interpretation for construction of 9^. The bias in 6 is eliminated by modifying 
the estimator in the direction of increasing its £i norm 



Theorem 3.3. Let {(^o(p)) ^(p); 0"(p))}peN be a converging sequence of instances of the standard 
Gaussian design model. Assume linip^oo \So{p)\/p = IP(0o 0)- Then, for i £ Sq{p), we have 



1) 



a . 



A more general form of Theorem 3.3 (cf. Theorem 3.6) is proved in Section [6j We indeed prove 
the stronger claim that the following holds true almost surely 



lim 

P^oo \S^{p)\ 



ie5g(p) 



a . 



(15) 



The result of Theorem 3.3 follows then by taking the expectation of both sides of Eq. ( 15 ) and using 



bounded convergence theorem and exchangeability of the columns of X. 

Our next theorem provides lower bound for the power of the proposed test. In order to obtain a 
non-trivial result, we need to make suitable sparsity assumptions on the parameter vectors 6*0 = 0o{p)- 
In particular, we need to assume that the non-zero entries of 6o are lower bounded in magnitude. If 
this were not the case, it would be impossible to distinguish arbitrarily small parameters 0o,i from 
Go,i = 0. Similar assumptions are made in [MB06| IZY06|. IWaiOQj . (The value of A can also be 
predicted, but we omit it for brevity.) 

Theorem 3.4. Let {{9o{p),n{p), a{p))}p,zfq be a converging sequence of instances under the standard 
Gaussian design model. Assume that \So{p)\ < ep, and for all i G Sq{p), |^o,i(p)| ^ l^ with /j, = 
Moc(p)/ \/n{p). Then, for i G Sq{p), we have 



l)> G(a 
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Table 1: A hypothesis testing procedure for testing Hq i under standard Gaussian design model. 



Testing hypothesis Hq j under standard Gaussian design model. 



Input: regularization parameter A, significance level a 

Output: p- values Pi, test statistics Ti^^{y) 

1: Let ^ r 1 

61(A) = argminggKp [^Wv - + ^ll'^llij- 



2: Let 



1,,^ 1 d 



where for v G M-'^, \v\£ is the £-th largest entry in the vector (|fi|, • • • ,\vn\ 
3: Let 

r = ^(A) + -X"^(y-X^(A)). 

n 

4: Assign the p-values Pi for the test Hq i as follows. 



Pi = 2(1-$ 



5: The decision rule is then based on the p-values: 



1 if Pi < a (reject the null hypothesis Ho^i), 
otherwise (accept the null hypothesis). 



where t^, = T^,{aQ,£,5) is defined as follows 

1 



if5>M{e), 



l-M{e)/6' ' (16) 
oo, if6<M{e). 



Finally, M(e) is the minimax risk of the soft thresholding denoiser, with following parametric ex- 
pression in terms of the parameter ^ G (0, oo).' 

Theorem 3.4 is proved in Section [6j We indeed prove the stronger claim that the following holds 
true almost surely: 

hm -1^ Yl ^^x(2/) > G(a, ^) . (18) 
|5o(p)| ^^^^^ V rj 
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The result of Theorem 3.4 follows then by taking the expectation of both sides of Eq. ( 18 ) and using 
exchangeability of the columns of X. 

Again, it is convenient to rephrase Theorem |3.4| in terms of the minimum value of fi for which 
we can achieve statistical power 1 — f3 £ (a, 1) at significance level a. It is known that M(e) = 
2elog(l/e) (l+0(e)) [ DMMllj . Hence, forn > 2 sq log(p/so) (l+0(so/p)), we have = 0(1). Since 
lim„^oo G{a,u) = 1, any pre-assigned statistical power can be achieved by taking jj, > C{e,5)a / ^/n 
which matches the fundamental limit established in the previous section. 



3.1.1 Numerical experiments 

As an illustration, we simulate from the linear model ([T]) with w ~ N(0,lpxp) and the following 
configurations. 

Design matrix: For pairs of values (n,p) = {(300, 1000), (600, 1000), (600, 2000)}, the design 
matrix is generated from a realization of n i.i.d. rows Xi ~ N(0, Ipxp)- 

Regression parameters: We consider active sets with IS'ol = sq S {10,20,25,50,100}, chosen 
uniformly at random from the index set {1, • • • ,p}. We also consider two different strengths of active 
parameters 6*0,1 = for i G S'o, with ^ G {0.1,0.15}. 

We examine the performance of the proposed testing procedure (cf. Table [T]) at significance levels 
a = 0.025,0.05. The experiments are done using gim net-package in R that fits the entire Lasso path 
for linear regression models. Let e = sq/p and 5 = n/p. We do not assume e is known, but rather 
estimate it as e = 0.25 5/ \og{2/8). The value of e is half the maximum sparsity level e for the given 
5 such that the Lasso estimator can correctly recover the parameter vector if the measurements 
were noiseless |DMM09l IBM12j . Provided it makes sense to use Lasso at all, e is thus a reasonable 
ballpark estimate. 



Method 


Type 1 err 


Type 1 err 


Avg. power 


Avg. power 




(mean) 


(std.) 


(mean) 


(std) 


Our testing Procedure (1000,600,100,0.1) 


0.05422 


0.01069 


0.44900 


0.06951 


Biihlmann's method (1000,600,100,0.1) 


0.01089 


0.00358 


0.13600 


0.02951 


Asymptotic Bound (1000,600,100,0.1) 


0.05 


NA 


0.37692 


NA 


Our testing Procedure (1000,600,50,0.1) 


0.04832 


0.00681 


0.52000 


0.06928 


Buhlmann's method (1000,600,50,0.1) 


0.01989 


0.00533 


0.17400 


0.06670 


Asymptotic Bound (1000,600,50,0.1) 


0.05 


NA 


0.51177 


NA 


Our testing Procedure (1000,600,25,0.1) 


0.06862 


0.01502 


0.56400 


0.11384 


Buhlmann's method (1000,600,25,0.1) 


0.02431 


0.00536 


0.25600 


0.06586 


Asymptotic Bound (1000,600,25,0.1) 


0.05 


NA 


0.58822 


NA 



Table 2: Comparison between our procedure (Table [l]), Buhlmann's method |Buhl2j and the asymp 



totic bound for our procedure (established in Theorem |3.4[) on the setup described in Section 3.1.1 



The significance level is a = 0.05. The means and the standard deviations are obtained by testing 
over 10 realizations of the corresponding configuration. Here a quadruple such as (1000, 600, 50, 0.1) 
denotes the values oi p = 1000, n = 600, sq = 50, ^ = 0.1. 



The regularization parameter A is chosen to satisfy Ad = kt, where r and d are determined in 
step 2 of the procedure. Here k = K{e) is the tuned parameter for the worst distribution pQ^ in 
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Figure 3: Comparison between our testing procedure (Table [T]), Biihlmann's method |Biihl2j and 
the asymptotic bound for our procedure (estabhshed in Theorem 3.4). Here, p = 1000, n = 600, sq = 
25,11 = 0.15. 



the sense of minimax estimation error among the e-sparse distribution^ In [DMM09j . its value is 
characterized for standard Gaussian design matrices. 

It is worth noting that the proposed test can be used for any value of A and its performance is 
robust for a wide range of values of A. However, the above is an educated guess based on the analysis 
of |DMM091 IBM12j . We also tried the values of A proposed for instance in |vdGB091 IBuhl2j on 
the basis of oracle inequalities. Finally, note that r and d implicitly depend upon A. Since glmnet 
returns the entire Lasso path, the prescribed A in above can simply be computed by applying the 
bisection method to equation Xd = kt. 

Fig. |3] shows the results of our testing procedure and the method of [Buhl2] for parameter values 
p = 1000, n = 600, So = 25, /i = 0.15, and significance levels a G {0.025,0.05}. Each point in the 
plot corresponds to one realization of this configuration (there are a total of 10 realizations). We 



also depict the theoretical curve (a, G(a, /io/T*)), predicted by Theorem 3.4, As it can be seen the 
experiment results are in a good agreement with the theoretical curve. 

We compare our procedure with the procedure proposed in |Buhl2| . Table [2] summarizes the 
performances of the two methods for a few configurations {p,n, sq, fi), and a = 0.05. Simulation 
results for a larger number of configurations and a = 0.05, 0.025 are reported in Tables [8] and [9] in 
Appendix [C| As demonstrated by these results, the method of |Buhl2j is very conservative. Namely, 
it achieves smaller type I error than the prescribed level a and this comes at the cost of a smaller 
statistical power than our testing procedure. This is to be expected since the approach of |Buhl2] is 



distribution p is e-sparse if p{{0}) > 1 — e. 
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tailored to adversarial design matrices X. 



3.2 Nonstandard Gaussian designs 

In this section, we generalize our testing procedure to nonstandard Gaussian design models where 
the rows of the design matrix X are drawn independently form N(0,5]). We will first consider 
the ideal case in which S is known. Later on, we will discuss the estimation of the covariance S 
(cf. Subroutine in Table |4]). Appendix [P] discusses an alternative implementation that does not 
estimate S but instead bounds the effect of unknown S. 

For given dimension p, an instance of the nonstandard Gaussian design model is defined by 
the tuple {T.,0o,n,a), where E G M^^p, S ^ 0, 6*0 G M^, n e N, a G M+. We will be inter- 
ested in the asymptotic properties of sequences of instances indexed by the problem dimension 



{{T.{p),9o{p),n{p),a{p))}p^fi. Motivated by Proposition 3.2, we define a property of a sequence of 
instances that we refer to as standard distributional limit. 

Definition 3.5. A sequence of instances {{'S{p),9(){p),n{p),a{p))}pi=fq indexed by p is said to have 
an (almost sure) standard distributional limit if there exist r, d G M, such that the following holds. 
Denote by 9 = 9{y, X, A) the Lasso estimator given as per Eq. and define 9" G M^, r G M" by 
letting 

9^ = e+-^-^X.''iy-X9), r=^iy-X9). (19) 
n \ n 



Let Vi = (00, ij 9f, (E for 1 < i < p, and u^^^ be the empirical distribution of {vi}^^^ defined as 

'^'~' = \jZ^^^^ (20) 



p 



i=l 



where denotes the Dirac delta function centered at Vi. Then, with probability one, the empirical 
distribution u^P') converges weakly to a probability measure u on M.^ as p ^ oo. Here, v is the 
probability distribution of {&o,Qq + tT^/^Z, T), where Z ~ N(0,1), and Qq and T are random 
variables independent of Z. Furthermore, with probability one, the empirical distribution of {ri}^^^ 
converges weakly to N(0,r^). 

Proving the standard distributional limit for specific families of instance sequences {(E(p), 9q{p), 
n(p), o"(p))}pgN is an outstanding mathematical challenge. In Section |4] we discuss a number of 
cases in which this can be established rigorously. We also outline a non-rigorous calculation using 
statistical physics methods that suggests significantly broader validity. Assuming validity of standard 
distributional limit, we generalize our hypothesis testing procedure to nonstandard Gaussian design 
models. In order to stress the use of the standard distributional limit, we refer to our test as 
SDL-TEST . 

The hypothesis testing procedure (SDL-test) is described in Table [Sj Our presentation of the 
SDL-TEST focuses on using exact covariance E to emphasize the validity of the proposed p-values. 
Parameters d and r in step 2 are defined in the same manner to the standard Gaussian designs. 



Notice that Definition 3.5 does not provide any explicit prescription for the value of d. Its definition 



in step 2 is indeed motivated by the general theory discussed in Section |4j 
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Table 3: SDL-test for testing hypothesis ffo,i under nonstandard Gaussian design model 



SDL-TEST: Testing hypothesis i?o,j under nonstandard Gaussian design model. 

Input: regularization parameter A, significance level a, covariance matrix S 

Output: p- values Pi, test statistics Ti^^{y) 

1: Let ^ r 1 

61(A) = argminggKp [^Wv - + ^ll^llij- 



2: Let 



where for v G M-'^, \v\£ is the i-th largest entry in the vector (|fi|, • • • ,\vn\ 
3: Let 

^« = ^(A) + -^-^x.'^iy - x.e(x)). 

n 

4: Assign the p-values Pi for the test Ho^i as follows. 

P = 2(l-^-U 



'r[(S-i).,]V2 
5: The decision rule is then based on the p-values: 



1 if -Pi < a (reject the null hypothesis Ho^i), 
otherwise (accept the null hypothesis). 



Under the assumption of a standard distributional limit and assuming null hypothesis Ho^i, the 
quantity 6V- /(T[{Y,~^)i^i]^^'^) is asymptotically N(0, 1), whence the definition of (two-sided) p-values 
Pi follows as in step 4. In the final step, the assigned p- value Pi is used to test the hypothesis Hoi. 



The following theorem is a generalization of Theorem 3.3 to nonstandard Gaussian designs under 
the standard distributional limit. 

Theorem 3.6. Let {{Ti{p),6Q{p),n(p),a(p))}p^^ be a sequence of instances for which a standard 
distributional limit holds. Further assume limp_s.oo \ So{p)\/p = IP(0o 7^ 0). Then, 

i^JsM ^ IP.o(rtm,x(2/) = !) = «. 



The proof of Theorem 3.6 is deferred to Section [6j In the proof, we show the stronger result that 



the following holds true almost surely 



lip igc! M r*,x(y) = a. (21) 
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The result of Theorem 3.6 fohows then by taking the expectation of both sides of Eq. (21 ) and using 
bounded convergence theorem. 

The foUowing theorem characterizes the power of SDL-test for general S, and under the as- 
sumption that a standard distributional limit holds . 

Theorem 3.7. Let {{T,{p),9Q(p),n{p),a{p))}p(zfq be a sequence of instances with standard distribu- 
tional limit. Assume (without loss of generality) (t{p) = \/n{p), and further |^o,i(p)|/[(5^ ""^jiji]^^^ ^ 
Ho for all i G So (p) . Then, 

A-SoiSiM E H'.«(nxfe) = l)>G(»,^). 



Theorem 3.7 is proved in Section [6j We indeed prove the stronger result that the following holds 



true almost surely 



We also notice that in contrast to Theorem 3.4 where r* has an explicit formula that leads to an 



analytical lower bound for the power (for a suitable choice of A), in Theorem 3.7, r depends upon A 



implicitly and can be estimated from the data as in step 3 of SDL-test procedure. The result of 



Theorem 3.7 holds for any value of A. 

Notice that in general the exact covariance T, is not available and we need to use an estimate of 
that in SDL-test. There are several high-dimensional covariance estimation methods that provide 
a consistent estimate S, under suitable structural assumptions on S. For instance, if S is sparse, 
S can be constructed by thresholding the empirical covariance, cf. Table ^ Note that the Lasso is 
unlikely to perform well if the columns of X are highly correlated and hence the assumption of sparse 
S is very natural. If the inverse covariance is sparse, the graphical model method of ^MB06j can 
be used instead. 

Appendix [D] describes an alternative covariance- free procedure that only uses bounds on E where 
the bounds are estimated from the data. In our numerical experiments and comparisons with other 
methods, we use the estimated covariance returned by Subroutine. The p-values computation 
appears to be fairly robust with respect to errors in the estimation of S. 



3.2.1 Numerical experiments 



We consider the same setup as the one in Section 3.1.1 except that the rows of the design matrix 
are independently Xi ~ N(0,I]). Here S S M^^^ is a circulant matrix with Sjj = 1, T,jk = 0.1 for 
j 7^ k, \j — k\ < 5 and zero everywhere else. (The difference between indices is understood modulo 

p: 



In Fig. \I{aj . we compare SDL-test with the procedure proposed in |Biihl2] . While the type 



I errors of SDL-TEST are in good match with the chosen significance level a, Biihlmann's method 
is overly conservative. It results in significantly smaller type I errors than a and smaller average 
power in return. Table [5] summarizes the performances of the two methods for a few configu- 
rations {p,n, So, fJ-), and a = 0.05. Simulation results for a larger number of configurations and 



a = 0.05,0.025 are reported in Tables 10 and 11 in Appendix [C 
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Table 4: Subroutine for estimating covariance S 



Subroutine: Estimating covariance matrix S 



Input: Design matrix X 

Output: Estimate S 

1: Let 5 = (l/n)X"^X G RP^P. 

2: Let ai be the empirical variance of the entries in S, and let A = {Sij : \Sij\ < 3o"i}; 
3: Fit a normal distribution to the entries in A; let C72 be the variance of this distribution; 
4: Construct the estimation E as follows: 

%j = SijI{\Sij\>3a2). (22) 
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(a) Comparison between SDL-TEST and Biihlmann'f 
method |Buhl2| . 




(b) Normalized histograms of Zg^ (in red) and Zgc 
(in white) for one reahzation. 



Figure 4: Simulation results for the setting in Section 3.2.1 and the configuration p = 2000, n = 600, 
So = 50, ^ = 0.1. 



Let Z = (-z^iLi denote the vector with Zi = ^"/(T[(S^^)jj]-'^/^). In Fig. 4(b) we plot the normalized 
histograms of Zsq (in red) and Z^g (in white) , where Zsq and Zs^ respectively denote the restrictions 
of Z to the active set Sq and the inactive set Sq. The plot clearly exhibits the fact that Z^g has 
(asymptotically) standard normal distribution, and the histogram of Zsq appears as a distinguishable 
bump. This is the core intuition in defining SDL-TEST. 



17 



TV T ^4- 'L. ^ J 

iVletnoa 


Type I err 


Type I err 


Avg. power 


Avg. power 




I T>T o n 1 

t^iiiedii J 








suij-test (^iuuu, ouu, iuu, u.i j 


U.Uo ( oo 


U.Ui ( ZU 


U.4ooUU 


U.Uo4oo 


rSunlmann s metnocl mJUU, uUU, iUU, U.ij 


U.UUobb 


U.UU4iD 


U.iiUUU 


U.Uzozo 


Lower bound (1000,600,100,0.1) 


0.05 


NA 


0.45685 


0.04540 


SDL-test (1000,600,50,0.1) 


0.04968 


0.00997 


0.50800 


0.05827 


Buhlmann's method (1000,600,50,0.1) 


0.01642 


0.00439 


0.21000 


0.04738 


Lower bound (1000,600,50,0.1) 


0.05 


NA 


0.50793 


0.03545 


SDL-test (1000,600,25,0.1) 


0.05979 


0.01435 


0.55200 


0.08390 


Buhlmann's method (1000,600,25,0.1) 


0.02421 


0.00804 


0.22400 


0.10013 


Lower bound (1000,600,25,0.1) 


0.05 


NA 


0.54936 


0.06176 



Table 5: Comparison between SDL-test, Biihlmann's method [Buhl2j and the lower bound for 
SDL-TEST power (cf. Theorem 3.7) on the setup described in Section 3.2.1 The significance level 
is a = 0.05. The means and the standard deviations are obtained by testing over 10 realizations of 
the corresponding configuration. Here a quadruple such as (1000,600,50,0.1) denotes the values of 
p = 1000, n = 600, So = 50, ^ = 0.1. 



4 Generalization and discussion 

In previous sections we described our hypothesis testing procedure (SDL-test ) using the Lasso 
estimator. Lasso estimator is particularly useful when one seeks sparse parameter vectors 9 satisfying 
Eq. ([2]). In general, other penalty functions than £1 norm might be used based on the prior knowledge 
about ^0- Here, we consider regularized least squares estimators of the form 

%, X) = arg mm {^||y - X^f + J(0)} , (23) 

with J{0) being a convex separable penalty function; namely for a vector 6 G M^, we have J{9) = 
Jii^i) + ■ ■ ■ + Jp{9p), where : M — )■ M is a convex function. Important instances from this ensemble 
of estimators are Ridge-regression {J{9) = A||6'|p/2), and the Lasso {J{9) = A||0||i). 

Assum ing the general penalty function J (9), the standard distributional limit is defined similar to 
except that the Lasso estimator 9 = 9(y, X, A) is replaced by the estimator 9 = 9{y, X) 



3.5 



Definition 
given by Eq. ( |23[ ). 

Generalizing SDL-test for convex separable penalty functions J{9) is immediate. The only 
required modification is about the definition of d in step 2. We let d be the unique positive solution 
of the following equation 

1 = ^ + - Trace! (1 + dS-^/^V^ J(^)S"i/2)-i\ (24) 
d n L J 

where V^J{9) denotes the Hessian, which is diagonal since J is separable. If J is nondifferentiable, 
then we formally set [V'^J{9)]ii = 00 for all the coordinates i such that J is non-differentiable at 9i. It 
can be checked that this definition is well posed and that yields the previous choice for J (9) = A|| 6*111. 
We next discuss the validity of standard distributional limit and the rationale for the above choice 
of d. 
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1. For S = Ipxp and J{6) = A||^||i |BM1H rBM12| proves that indeed the standard distributional 
Umit holds. 

2. For S = Ipxp and general separable J (9), a formal proof of the same statement does not exist. 
However, the results of Talagrand on the Shcherbina-Tirozzi model (cf. [TallOl Theorem 3.2.14] 
and comment below) imply the claim for strictly convex J{6) under the assumption that a set 
of three non-linear equations admit unique solution. This can be checked on a case-by-case 
basis. Additional evidence is provided by the remark that the AMP algorithm to compute 9 
satisfies the standard distributional limit at each iteration. 

3. The work in [RanlH IJM12j extends the analysis of AMP to a larger class of algorithms called 
G-AMP. This suggests that the standard distributional limit can be shown to hold for special 
block-diagonal matrices S as well. 

4. Finally, the broadest domain of validity of the standard distributional limit can be established 
using the replica method from statistical physics. This is a non-rigorous but mathematically 
sophisticated technique, originally devised to treat mean- field models of spin glasses |MPV87] . 
Over the last 20 years, its domain was considerably extended [MM09j . and in fact it was 
already successfully applied to estimation problems under the noisy linear model ([T]) in the 
case S = I |l'an02l IGV051 IKWT091 IRFG09| . While its vahdity was confirmed by rigorous 
analysis in a number of examples, developing an equally powerful probabilistic method remains 
an outstanding challenge. 



Replica Method Claim 4.1. Assume the sequence of instances {{'E{p),9(p),n{p),a{p))}p^f>i to be 
such that, as p — )• oo; (i) n{p)/p — )• 5 > 0; (ii) a{p)'^/n{p) — )• ctq > 0; [iii) The sequence of functions 

(a^P\a,b) = -Emm\h9-9o-Va^-^/^Z\\l + J{9)}, (25) 

with = (^) and Z ~ N(0,lpxp) admits a differentiable limit <B{a,b) on x M^, with 

V<£^\a,b) —7- VC;(a,6). Then S has a standard distributional limit. Further let 

r^biy) = arg min [^\\9 - y\\l + J{9)] , (26) 
Ei(a,6)= lim -E{\\r]b{9o + V^^'^^^Z) - 9o\\l} , (27) 

p— !>00 p ^ 

where the the limit exists by the above assumptions on the convergence of <B^P\a,b). Then, the 
parameters r and d of the standard distributional limit are obtained by solving Eq. (24) for d and 

T' = ai+^-E^{T^l/d). (28) 

We refer to Appendix |B] for the related statistical physics calculations. 

It is worth stressing that convergence assumption for the sequence ^ '^^^ (a, b) is quite mild, and 
is satisfied by a large family of covariance matrices. For instance, it can be proved that it holds for 
block- diagonal matrices S as long as the blocks empirical distribution converges. 
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Figure 5: Parameter vector for the communities data set. 



5 Real data application 

We tested our method on the UCI communities and crimes dataset [FAlOj . This concerns the 
prediction of the rate of violent crime in different communities within US, based on other demographic 
attributes of the communities. The dataset consists of a response variable along with 122 predictive 
attributes for 1994 communities. Covariates are quantitative, including e.g., the fraction of urban 
population or the median family income. We consider a linear model as in ^ and hypotheses ifo,i- 
Rejection of i?o,j indicates that the i-th attribute is significant in predicting the response variable. 

We perform the following preprocessing steps: (i) Each missing value is replaced by the mean of 
the non missing values of that attribute for other communities, (ii) We eliminate 16 attributes to 
make the ensemble of the attribute vectors linearly independent. Thus we obtain a design matrix 
Xtot G M"t°t^P with ntot = 1994 and p = 106; (in) We normalize each column of the resulting design 
matrix to have mean zero and £2 norm equal to y^n^t. 

In order to evaluate various hypothesis testing procedures, we need to know the true significant 
variables. To this end, we let 6*0 = (Xj|^^Xtot)~"'^Xj^^z/ be the least-square solution. We take Oq as 
the true parameter vector obtained from the whole data set. Fig. [5] shows the the entries of ^o- 
Clearly, only a few entries have non negligible values which correspond to the significant attributes. 
In computing type I errors and powers, we take the elements in with magnitude larger than 0.04 
as active and the others as inactive. 

We take random subsamples of size n = 84 from the communities. We compare SDL-TESTwith 
Buhlmann's method over 20 realizations and significance levels a = 0.01,0.025,0.05. Type I errors 
and powers are computed by comparing to ^o- Table [6] summarizes the results. As it can be seen, 
Buhlmann's method is conservative yielding to zero type I error and smaller power than SDL-TESTin 
return. 

In table [7j we report the relevant features obtained from the whole dataset as described above 
i.e., and the relevant features predicted by SDL-test and the Biihlmann's method from one 
random subsample of communities of size n = 84. Features description is available at [FAlOj . 
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Method 


Type I err Avg. power 
(mean) (mean) 


SDL-test {a = 0.05) 
Biihlmann's method 


0.0172043 0.4807692 
0.1423077 


SDL-test [a 0.025) 
Biihlmann's method 


0.01129032 0.4230769 
0.1269231 


SDL-test (a = 0.01) 
Biihlmann's method 


0.008602151 0.3576923 
0.1076923 



Figure 6: Normalized histogram of Zs^ (in 
red) and Z^g (in white) for the communities 
data set. 



Table 6: Simulation results for the communities 
data set. 



Finally, in Fig. [6] we plot the normalized histograms of (in red) and Zs^ (in white). Recall 
that Z = {zi)\^^ denotes the vector with Zi = 0"/(r[(i;^^)jj]^/^). Further, Zg,, and Zgg respectively 
denote the restrictions of Z to the active set 5*0 and the inactive set ^q. This plot demonstrates that 
has roughly standard normal distribution as predicted by the theory. 
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Relevant features 


racePctHisp, PctTeen2Par, PctlmmigRecent, PctlmmigRecS, Pc- 
timmigrteciu, -r CLiNOTjopeaKii/ngivveii, L/wnv^ccniv^uart, iNum- 
Street, PctSameState85, LemasSwFTFieldPerPop, LemasTotRe- 
nPerPnn RfleiallVratelir^ninmPnl PnlieODorRnHp' 


Q! — U.Ui 


Relevant features 
(SDL-test) 


racePctHisp, PctTeen2Par, PctlmmigRecent, PctlmmigRecS, Pc- 
tlmmigReclO, PctNotSpeakEnglWell, OwnOccHiQuart, Num- 
Street, PctSameState85, LemasSwFTFieldPerPop, LemasTotRe- 
qPerPop, RacialMatchCommPol, PolicOperBudg 


Relevant features 
(Bulilmann's method) 


1 acer^cLnisp, r^cioameoijaieoo 


a = 0.025 


Relevant features 

(SDL-TEST ) 


racePctHisp, PctTeen2Par, PctlmmigRecent, PctlmmigRecS, Pc- 
tlmmigReclO, PctNotSpeakEnglWell, PctHousOccup, OwnOc- 
cHiQuart, NumStreet, PctSameState85, LemasSwFTFieldPer- 
Pop, LemasTotReqPerPop, RacialMatchCommPol, PolicOper- 
Budg 


Relevant features 
(Bulilmann's method) 


1 acer^CLnisp, r^cioameoLaLeoO 


a = 0.05 


Relevant features 
(SDL-test) 


racePctHisp, PctUnemployed, PctTeen2Par, PctlmmigRecent, 
PctlmmigRecS, PctlmmigReclO, PctNotSpeakEnglWell, Pc- 
tHousOccup, OwnOccHiQuart, NumStreet, PctSameState85, 
LemasSwornFT, LemasSwFTFieldPerPop, LemasTotReqPerPop, 
RacialMatchCommPol, PctPolic White 


Relevant features 
(Biihlmann's method) 


racePctHisp, PctSameStateSS 



Table 7: The relevant features (using the whole dataset) and the relevant features predicted by 
SDL-TEST and Biihlmann's method |Biihl2j for a random subsample of size n = 84 from the 
communities. The false positive predictions are in red. 



6 Proofs 

6.1 Proof of Lemma 12.41 

Fix a G [0, 1], ^ > 0, and assume that the minimum error rate for type II errors in testing hypothesis 
Ho^i at significance level a is /3 = /3°''*(a;^). Further fix ^ > arbitrarily small. By definition 
there exists a statistical test Tj^x : IK'™ {0, 1} such that IPe(7i,x(y) = 1) < a for any 6* G and 



IPe(7i,x(y) = 0) < for any 61 G (with Q.q, G W defined as in Definition [2^ . Equivalently: 

E{Pe(r,,x(y) = 1|X)} < a, for any 9 G l^o, 
]E{Pem,x(y) = 0|X)} < /? + for any O^VLi. 

We now take expectation of these inequalities with respect to ~ Qo (in the first case) and 6 ^ Qi 



(in the second case) and we get, with the notation introduced in the Definition 2.3 



E{lPQ,o,x(7;,x(y) = 1)} <a, 
E{PQ,i,xm,x(y) = 0)} </3 + e. 
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Call ax = IPQ,o,x(7i,x(y) = !)• By assumption, for any test T, we have PQ,i,x(7i,x(y) = 0) > 
/^l'x('^x;Q) and therefore the last inequalities imply 

E{ax} < a, 
IE{/3l:^(ax;Q)}</3 + ^ 

The thesis follows since ^ > is arbitrary. 



6.2 Proof of Lemma 12.51 

Fix X, a, z, S as in the statement and assume, without loss of generality, P^Xj 7^ 0, and rank(X5) = 
|5| < n. We take Qo = N(0, MI5) where M G M+ and Is G W^p is the diagonal matrix with {\s)ii = 
1 if i G S and (15)11 = otherwise. For the same covariance matrix I^, we let Qi = N(/iej,Ml5) 
where Cj is the i-th element of the standard basis. Recalling that i ^ S, and \S\ < sq, the support of 
Qo is in Qq and the support of Qi is in Qi. 

Under Pq,o,x we have y ~ N(0, MX^XJ + ct^I), and under IPq,i,x we have y ~ N(/xXi, MXsXj + 
cr^I). Hence the binary hypothesis testing problem under study reduces to the problem of testing a 
null hypothesis on the mean of a Gaussian random vector with known covariance against a simple 
alternative. It is well known that the most powerful test |LR051 Chapter 8] is obtained by comparing 
the ratio IPQ,o,x(y)/IPQ,i,x(y) with a threshold. Equivalently, the most powerful test is of the form 



r.,x(y) = I{{^^x^, (mx^xJ + a^iy^y) > c} , 



for some c G M that is to be chosen to achieve the desired significance level a. Letting 

c 



Q EE 2$ 



/i||(MX5XT + a2l)-i/2 



it is a straightforward calculation to drive the power of this test as 



G a 



||(MXsXT + a2l)-V25 



where the function G{a,u) is defined as per Eq. (12). Next we show that the power of this test 
converges to 1 — (3°'^-^^^{a; S, /i) as M — t- 00. Hence the claim is proved by taking M > M(^) for some 
M(^) large enough. 

Let Xs = UAV'^ be a singular value decomposition of Xs- Therefore, columns of U form a basis 
for the linear subspace spanned by {xijjgs. Let U be such that its columns form a basis for the 
orthogonal subspace {5?j}j^s. Then, 

HxJ{MXsX.l + a^I)-^Xi _ fixJiUiMA"^ + aH^W^ + a~'^UU'^}xi 



||(MX5XT + a2l)-i/2j.|| ||{C/(MA2 + a2l)-V2;7T + ^-i^^T}5.|| ' 

Clearly, as M — )• 00, the right hand side of the above equation converges to (;u/(T)||C/xj|| = (;u/(j)||P;55;j||, 
and thus the power converges to 1 — /3°x'^'^(q^; 5',^) = G{a, fia~^\\F 



sXi\ 
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6.3 Proof of Theorem Q 

Let ux = ^||P5Sj||2/o"- By Lemma 



2.4 



and 



2.5 



we have, 



1 - /3°P*(a; fj.) < sup |EG(ax, ^x) : lE(ax) < a| , 



with the sup taken over measurable functions X i— )• ax, and G{a,u) defined as per Eq. (12). 



It is easy to check that a i— )• G{a, u) is concave for any u G M+ and u i— )• G(a, u) is non-decreasing 
for any a G [0, 1] (see Fig. [T]). Further G takes values in [0, 1]. Hence 

IEG(ax,^ix) < IE{G(ax,nx)I(n < ^xo)} +P(nx > uo) 
<E{G(ax,no)}+P(nx>no) 
<G(E(ax),no)+P(nx >uo) 
< G(a,^io) + P(mx > wo) 

Since Xi and X5 are jointly Gaussian, we have 



with Zi ~ N(0,I„xn) independent of X^. It follows that 



with Zn-SQ+i a chi-squared random variable with n — sq + 1 degrees of freedom. The desired claim 
follows by taking uq = {fj,/a)^T.i\g{n — sq + £). 

6.4 Proof of Theorem 13.41 

Since = Ipxp, Gq{p), n{p),a{p))}p,z^ has a standard distributional limit, the empirical distribu- 

tion of { (6*0,1 ; converges weakly to (0o, 60 + tZ) (with probability one). By the portmanteau 

theorem, and the fact that liminf (t{p) / \/n{p) = cjo, we have 



< leol < ^0^0) < lim - Vlfo < % < Aio4=l = 0- 



(29) 



In addition, since fioao/2 is a continuity point of the distribution of Oq, we have 



1 ^ 

lim - Vl(|%i 



> 



/^0O"0- 



leol > ^). 



(30) 



2 = 1 



Now, by Eq. ([29]), P(|Go| > = P(eo / 0). Further, I{\9o,i\ > /Uocto/2) = 1(^0,* / 0) for 

(31) 



l<i<p, asp— T-oo. Therefore, Eq. (30) yields 

p 



lim -\So{p)\= lim - V 1(^0,2 / 0) = P(Go / 0). 

) — ^00 n n — ^00 r) ^ — * 



p— >oo p 



p-->oo p ^ 



i=l 
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Hence, 

\So{p)\ ^^^^^ P-oo |5o(p)| 



1 

lim - Vl(P»<Q,iG5o(p)) 



/ 0) P^oo p 



1 P / irnt 

lim -yiU>(l-a/2) < ^ 



'o,i| > ^0 



(32) 

a{p) 



9o / 0) P^oo p ^ \ T ' ^/n{p) 

> STZT^^ IP < |^ + ^|, |0o| >/Uo<To). 



Note that r depends on the distribution poo- Since |So(p)| < ep, using Eq. (31), we have P(0o 7^ 
0) < e, i.e, is e-sparse. Let f denote the maximum r corresponding to densities in the family 
of e-sparse densities. As shown in [DMM09j . f = nao, where is defined by Eqs. (16) and (17). 
Consequently, 



liP^TT E 7^.x(y)>pf^(l-a/2)<|^ + z|) 



1 - pf - $(1 - a/2) - ^ < Z < $(1 - a/2) - f^) (33) 
1 _ {$($(1 _ a/2) - fio/n) - $(-$(1 - a/2) - fxo/n)} 



Now, we take the expectation of both sides of Eq. ( 33 ) with respect to the law of random design X 
and random noise w. Changing the order of limit and expectation by applying dominated convergence 
theorem and using linearity of expectation, we obtain 

lim le ! M yZ IEx,«,{7i.x(y)} > cfa, — 
|So(p)| ^^^^^ V r. 

Since Ti^xiu) takes values in {0, 1}, we have IEx,«){7i,x(y)} = P6»o{p)(^i,x(y) = !)• The result follows 
by noting that the columns of X are exchangeable and therefore P5ig(p)(rj^x(y) = 1) does not depend 
on i. 



6.5 Proof of Theorem 13.61 

Since the sequence {T,{p),6Q{p),n{p),a{p)}pi=^ has a standard distributional limit, with probabil- 
ity one the empirical distribution of {(^0,11^"' (^~^)«,i)}?=i converges weakly to the distribution of 
(Go, Qo + tT^/^Z, T). Therefore, with probability one, the empirical distribution of 

lr[(S-i),,]i/2/^^^ 
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converges weakly to N(0, 1). Hence, 



lim 



P^oo \S^{p)\ 



a 



lim - <a,iG 5§(p)) 

1 1 ^ / 

lim - Vl $(1 - a/2) < 

Bo = 0) P ^ V 



70,i 



(34) 



1 



P($(l-a/2) < |Z|,eo = 0) 



;eo = 0) 

P($(l - a/2) < \Z\) = a. 



Applying the same argument as in the proof of Theorem 3.4 we obtain the following by taking the 
expectation of both sides of the above equation 

i^lSM ^ IPMp)(^«,x(2/) = !) = «. 

In particular, for the standard Gaussian design (cf. Theorem |3.3[ ), since the columns of X are 
exchangeable we get limp_j.oo IF'6»o(p)(^*,x(?/) = 1) = a for all i £ So{p)- 



6.6 Proof of Theorem l3?rl 



Proof of Theorem 3.7 proceeds along the same lines as the proof of Theorem 3.4 Since {(S(p), 9o{p),n{p), 
cr{p))}p£n has a standard distributional limit, with probability one the empirical distribution of 
{(^0,2) ^i^' (^~^)«,«)}f=i converges weakly to the distribution of (0o,6o + tT^/^Z,T). Similar to 
Eq. (31 ), we have 



lim i|5o(p)l =IP(0o/O). 

p— >oo p 



(35) 



Also 

\So{p)\ 



ieSo(p) 



a 



ieSo(p) 



1 1 ^ 

— — lim - ^IiP,<a,iGSo{p)) 



^ lim -^l($(l - a/2) < 



/ 0) p->oo p ^ 



P(Gc 



Or 



-[(S-i),,]i/2' [(E-i),,]V2 



I en 



> 



1 



o/O) 



$(1 - a/2) < 



i/"o 



1 _ {$($(1 _ a/2) _ ^o/r) - $(-$(1 - a/2) - /io/r)} 
G(a,^o/T) . 
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Similar to the proof of Theorem 3.4 , by taking the expectation of both sides of the above inequahty 
we get 



lim ^ Fe,m,x{y) = l)>G{a, 



^0 



i65o(p) 
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A Statistical power of earlier approaches 

In this appendix, we briefly compare our results with those of Zhang and Zhang |ZZllj . and 
Biihlmann |Biihl2j . Both of these papers consider deterministic designs under restricted eigen- 
value conditions. As a consequence, controlling both type I and type II errors requires a significantly 
larger value of ///a. 

Following the treatment of |ZZllj . a necessary condition for rejecting Nq j with non-negligible 
probability is 



> CTja{l + €'J, 



which follows immediately from |ZZ1H Eq. (23)]. Further tj and are lower bounded in |ZZllj as 
follows 



\Xj\\2 



n 



where for a standard Gaussian design ry* > ^logp. Using further ||xj||2 < 2-y/n which again holds 
with high probability for standard Gaussian designs, we get the necessary condition 



, rcjsologp a i 
> c max <^ , } , 



for some constant c'. 

In |Buhl2| , p- values are defined, in the notation of the present paper, as 



Pj = 2{l - $((a„,py((T)|%,corr| " Ai)+)} , 



with 0jxorr a 'corrected' estimate of ^ojj cf. jBiihl2l Eq. (2.14)]. The corrected estimate Qj^con is 
defined by the following motivation. The ridge estimator bias, in general, can be decomposed into 
two terms. The first term is the estimation bias governed by the regularization, and the second term 
is the additional projection bias Px^o — ^Oj where Px denotes the orthogonal projector on the row 
space of X. The corrected estimate ^j,corr is defined in such a way to remove the second bias term 
under the null hypothesis -ffoj- Therefore, neglecting the first bias term, we have ^j.corr = (Px)jj^Oj- 
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Assuming the corrected estimate to be consistent (which it is in ii sense under the assumption 
of the paper), rejecting Hqj with non-neghgible probabihty requires 



^ f ^wfry — ^ — T ™ax{Aj, 1} , 



«n,p;i(0-)|(Px)iil 

Following |Biihl2| Eq. (2.13)], and keeping the dependence on sq instead of assuming sq 
o((n/ logp)^), we have 



„ l(Px)jfc| logp 

—-y- r r = C maX — fTSQ 1 ' 

an,p;j {(T) I (Px)ii I k&[p]\j \ (Px)ij | 



n 

Further, plugging for an,pj we have 

1 ^ '^VWj 

an,p;j (cr) I (Px) I Vn I (Px) jj | ' 

For a standard Gaussian design {p/n){Px)jk is approximately distributed as ui, where u = {ui,U2, ■ ■ ■ , 
Un) € is a uniformly random vector with ||u|| = 1. In particular ui is approximately N(0, 1/n). 
A standard calculation yields max,fcg[p]\j |(Px)jA:| > \/n \ogp/p with high probability. Furthermore, 
|(Px)jil concentrates around n/p. Finally, by definition of (cf. |Biihl21 Eq. (2.3)]) and using 
classical large deviation results about the singular values of a Gaussian matrix, we have Qjj > {n/p)"^ 
with high probability. Hence, a necessary condition for rejecting Hq^ with non-negligible probability 
is 



|eo,|>Cmax|— — ,— I 



n \/n . 

as stated in Section [ij 



B Statistical physics calculation 



In this section we outline the replica calculation leading to the Claim 4.1, We limit ourselves to the 
main steps, since analogous calculations can be found in several earlier works |Tan02^ [GV05|, iTKlOj . 
For a general introduction to the method and its motivation we refer to |MPV87l IMM09j . Also, for 



the sake of simplicity, we shall focus on characterizing the asymptotic distribution of 0", cf. Eq. (19). 
The distribution of r is derived by the same approach. 

Fix a sequence of instances {(S(p), 6'o(p), n(p), o"(p))}pgN. For the sake of simplicity, we assume 
(t(p)^ = n(p)(To and n{p) = p6 (the slightly more general case a{p)'^ = n{p)[aQ+o{l)] and n{p) = p[6+ 
o(l)] does not require any change to the derivation given here, but is more cumbersome notationally) . 
Fix ^:MxMxM— J-Ma continuous function convex in its first argument, and let g{u, y, z) = 
maxx^^[ux — g{x, y, z)] be its Lagrange dual. The replica calculation aims at estimating the following 
moment generating function (partition function) 

/n P 
exp {-^\\y- - pJ{e) - psY,[9{uu 0o,u {^-%) - Ui9n 
i=i 

- —(sdf\\X.^-\\\l\d9du. (36) 
2n J 
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Here {yi,Xi) are i.i.d. pairs distributed as per model (|lj) and 9^ = 9 + (d/n) S~^X"'"(y — X.9) with 
d G M to be defined below. Further, g:Mx]RxM— T-Misa continuous function strictly convex in 
its first argument. Finally, s G M+ and /3 > is a 'temperature' parameter not to be confused with 
the type II error rate as used in the main text. We will eventually show that the appropriate choice 
of d is given by Eq. (24). 

Within the replica method, it is assumed that the limits p — )• oo, /3 — t- oo exist almost surely for 
the quantity {pP)~^ log Zp[f3, s), and that the order of the limits can be exchanged. We therefore 
define 



^(s) = -lim lira ^ log Zp{f3,s) 
= - lim lim ^ log Zp{f3,s) 



(37) 
(38) 



In other words ^{s) is the exponential growth rate of Zp{l3, s). It is also assumed that p ^ log Zp{(3, s) 
concentrates tightly around its expectation so that 3^(s) can in fact be evaluated by computing 



d{s) = - lim lim ^Elog Zp{l3,s), 

(S^oap^oo pp 

where expectation is being taken with respect to the distribution of {yi,xi] 
that, by Eq. (38) and using Laplace method in the integral (36), we have 



(39) 

{yn,Xn)- Notice 



11 ^ 1 

lim - min | ||y - X0||i + J(0) + s ^o.i, (S-^)ii) - ^A"] + 7^(sd)'||XE-i^/||2| 

Finally we assume that the derivative of 5^(s) as s — can be obtained by differentiating inside the 
limit. This condition holds, for instance, if the cost function is strongly convex at s = 0. We get 



s = 0) = limp^oo \ YlLi min«,GlRb(^^i, ^0,i, (^ ^] 



Ui 



(40) 



were 



= + (d/n) S ^X"'"(y — X0) and 9 is the minimizer of the regularized least squares as in 
Section [sj Since, by duality 'g{x, y, z) = max^jg]g[uj; — g{u, y, z)], we get 



ds 



is = 0) 



limp^oo|ELi5(^r,%,(S-^k) 



(41) 



Hence, by computing ^{s) using Eq. (39) for a complete set of functions g, we get access to the 
corresponding limit quantities (41) and hence, via standard weak convergence arguments, to the 



joint empirical distribution of the triple {9f,9o^i, (S~^)jj), cf. Eq. (20). 

In order to carry out the calculation of Sis), we begin by rewriting the partition function (36) in 
a more convenient form. Using the definition of 0" and after a simple manipulation 

Zpi(3,s) = 

/exp { - |-||y - X(^ + sd^~\)g - I3J{9) + (3s{u, 9) - (3 sY,g{u^, {^-%)}d9du . 

i=l 
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Define the measure u{d9) over G as follows 



/p 
exp ^ - /3J{e - sd^-\) + (3s{e - sd^~^u,u) - /3s'^g{ui,eo^i,{j:-^)ii)^ du . (42) 
i=i 

Using this definition and with the change of variable 9' = 9 + sdS~^ti, we can rewrite Eq. (42) as 
Zp(/3,s) = lexp{-^\\y-X9\\l]u{d9) 

= Iexp[i^{z,y-X9)}iy{d9)jn{dz) 

:p |i J ^ {w, z)+iJ^ {z, X(eo - ^)> I iy{d9) j{dz) , (43) 



where jnidz) denotes the standard Gaussian measure on M": jn (dz) = (27r)-"/2exp(-||z||2/2)dz. 



The replica method aims at computing the expected log-partition function, cf. Eq. ( |39| ) using 
the identity 



ElogZp(/?,s) 



dk 



k=0 



\ogE{Zp{p,s)'}. 



(44) 



This formula would require computing fractional moments of Zp as k ^ 0. The replica method 
consists in a prescription that allows to compute a formal expression for the k integer, and then 
extrapolate it as /c — )• 0. Crucially, the limit — ?• is inverted with the one p — )• oo: 



hm -Elog Zp{P,s) = ^ 
p-s>oo p dk 



hm -logE{Zp(/3,s)'=}. 



In order to represent Zp{j3, s)'^, we use the identity 



( / /(x),p(dx))'= / /(x^)/(x2).../(x'=)p(dxi)...p(dx^). 



(45) 



(46) 



In order to apply this formula to Eq. (43), we let, with a slight abuse of notation, i'^{d9) = v[d9^) x 
i>{d9'^) X • • • X v{d9^) be a measure over (M^')^, with 9^, ... ,9'' G W. Analogously 7^(d2;) = jn{dz^) x 
7„(dz^) X • • • X 7„(d2:^), with z^, . . . ,z^ G W^. With these notations, we have 

E{Zp{(3, .)n = /e exp {iM{w, J2 + E ^"(^0 " ^")^) } ^'(d^) ^'(^z) • (^7) 

a=l ^ a=l 

In the above expression E denotes expectation with respect to the noise vector w, and the design 
matrix X. Further, we used ( • , • ) to denote matrix scalar product as well: {A, B) = Trace(^"''-B). 

At this point we can take the expectation with respect to w, X. We use the fact that, for any 
M G WP, li G 

E{exp{i{w,u))} = expj - ^ncjo llnlllj , 
E{exp(i(M,X))} = exp{-^(M,MS)}, 
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Using these identities in Eq. (47), we obtain 



k k 

/exp { - ^ /3ao^ E ll^'ll' - 1^ E ^(^" - ^o)' ^(^' - ^o))} '^'(d^) 7^(d^) • (48) 

a=l a,6=l 

We next use the identity 

liri J ioo,ioo) J (—00,00) 

where the integral is over C, G {—ioo,ioo) (imaginary axis) and q S (—00,00). We apply this identity 
to Eq. ( [48| ), and introduce integration variables Q = {Qab)i<a,b<k and A = {Aab)i<a,b<k- Letting 
= I\a,b dQafc and dA = Ha^fe dA^fe 

K{^p}= (^)'Y exp{-p5fc(Q,A)}dQdA, (49) 

k 

5fc(Q, A) = ^ E ^'^f^'^" - - - ^ ' (^0) 

a,b=l ^ 
k 



e(A)= /exp{^ Kb{{e''-eo)Mo'-eo))}iyH<ie), (5i) 

e(Q)= /"exp{-^ ^(^2l + Q)a,6^?4}7i(d^i). (52) 



a,fe=l 



Notice that above we used the fact that, after introducing Q, A, the integral over {z^, . . . , z^) G (M")'^ 
factors into n integrals over (W)^ with measure ^i{dzi). 

We next use the saddle point method in Eq. pOl) to obtain 



- hm -logE{Zp^} =5fc(Q*,A*), (53) 

p^oo p ^ 

where Q* , A* is the saddle-point location. The replica method provides a hierarchy of ansatz for this 
saddle-point. The first level of this hierarchy is the so-called replica symmetric ansatz postulating 
that Q* , A* ought to be invariant under permutations of the row/column indices. This is motivated 
by the fact that Sk{Q, A) is indeed left unchanged by such change of variables. This is equivalent to 
postulating that 

I go otherwise, I /3Co otherwise. 



where the factor /? is for future convenience. Given that the partition function, cf. Eq. (36) is the 
integral of a log-concave function, it is expected that the replica-symmetric ansatz yields in fact the 
correct result |MPV87l lMM09j . 
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The next step consists in substituting the above expressions for Q*, A* in Sk{- , •) and then 
taking the hmit A: — >• 0. We will consider separately each term of Sk{Q,A), cf. Eq. (50). 
Let us begin with the first term 



^ A:,Q:, = kPCiqi + k{k - l)/3Cogo . 



(55) 



a,b=l 



Hence 



a,b=l 



fc->oo 2k 

Let us consider ^(Q*)- We have 

log^(Q*) = logDet(I + l3aH + /3Q*) 

= log (1 + /5(gi - go)) - \ log (1 + - go) + + go)) 

In the limit fc — )> we thus obtain 

lim \{-5) \ogM) = I log (1 + /3(gi - go)) + ^ + 



(56) 



+ /3(gi 



(57) 
(58) 

(59) 



Finally, introducing the notation = we have 

e(A*)= /exp{^(Ci-Co)Ell^"-^o||| + ^ 5]((^'^-^o),S(^''-eo))}^'^(d0) 

a=l a,6=l 

= E /exp { ^(Ci - Co) E 11^" - ^olll + /3\/CbE^^' - ^o))} '^'(d^) ' 
•J — 1 — 1 



where expectation is with respect to Z ~ N(0,lpxp)- Notice that, given Z G M^, the integrals over 
9^,9"^, ... ,6^ factorize, whence 



C(A*) =E' 



I exp {^(Ci - Co)\\0 - 9o\\l + f^^/Co{Z, T}/\9 - 9^))] u{d9) 



(60) 



Therefore 



lim^loge(A*) 
fc^o pk 



1] 

P 



E<^ log 



exp 



{ y(Ci - Co)||^ - ^olll + P^/Qo{Z, T.^'\e - 9o))]um 



(61) 
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Putting Eqs. (56), (59), and (61) together we obtain 



- lim ^-Elog Zp = lim -L5fc(Q*, A*] 



(62) 



Y (Cift - Co'Zo) + ^ log (1 + - (Zo)) + 2 ^ 



- qo) 



oils 



+ /3yCb(^,Si/2(e-eo)>}Kd^) 



We can next take the Umit /3 — )• oo. In doing this, one has to be careful with respect to the 
behavior of the saddle point parameters ^Oi^ij CoiCi- A careful analysis (omitted here) shows that 
QO) li have the same limit, denoted here by qo, and Co) Ci have the same limit, denoted by Co- Moreover 
li — % = (q/P) + and Ci ~ Co = {~C/ P) + o{fi''^). Substituting in the above expression, and 

using Eq. ( 39 ) , we get 



^?(s)=^(Cog-C'zo) + ^^YT^ 



+ hm -E min - ^olll - yCo(^, ^^'\0 - 6^)) + J(0; s)} 



i=l 



J{e; s) = min \j{9 - sdT,-^u) - s{9 - sdS-^-u, n) + s V g{ui, 9o^„ 

After the change of variable 6 — sdS~^u — )■ 6, this reads 
d{s) = 

+ lim -E min | - lb - - ^S^^/^^ + sdS^^u ^ +J(e,u;s)} 
p^oD p e,ump 12\\ C s J 



(63) 



(64) 



\ , ^ lo + (^o Co 



(65) 



J(0,n;s) = J(e)-s(0,n) + s^5(^^i,^o,i,(5]-')ii) 



(66) 



i=l 



Finally, we must set C, Co and g, qo to their saddle point values. We start by using the stationarity 
conditions with respect to q, qo: 



dq 
dl 
dqo 



is) 
is) 



2^° 2(l + g)2 
_6_ ^ 1 
"2*^"^ 2 1 + q' 



(67) 
(68) 
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We use these to eliminate q and go- Renaming Co = C > we get our final expression for 5^(s): 



l-d)CT - -Q T + -<ToC 



(69) 



+ lim -E min 

p^oo p e,u&u.p L 2 I 



-^o-TS-i/^^ + sdS-^n ^ + J(0,u;s)}, 



i=l 



Here it is understood that Q and are to be set to their saddle point values. 

We are interested in the derivative of '^{s) with respect to s, cf. Eq. (41). Consider first the case 
s = 0. Using the assumption ^^'P\a,b) <^{a,b), cf. Eq. (25), we get 



= 0) = - J(l - 6)Cr' - ^CV^ + ^agC + ^(r^ C) • 



The values of t'^ are obtained by setting to zero the partial derivatives 



5t2 



(. = 0) 
(. = 0) 



6 

— c 
2 



Define, as in the statement of the Replica Claim 

Ei(a,6)= lim -¥.{\\r]b{eo + Z) - O^fA , 

p— !>00 p ^ 

^2{a,b)= lim ^E|div%(0o + \/^S-i/2z)| 

p— >oo p 

= lim — E{(??b(0o + \/^S-i/2^),Si/2z)}, 

where the last identity follows by integration by parts. These limits exist by the assumption that 
V(£(P)(a,6) V(£(a,6). In particular 



(71) 

(72) 
(73) 

(74) 
(75) 



f (r^C) = ^E,(r^C)-r^E,(r^C) + ^r^ 



(r^C) = -^E2(r^C) + k• 



(76) 
(77) 



Substituting these expressions in Eqs. (72), (73), and simplifying, we conclude that the derivatives 
vanish if and only if Cit"^ satisfy the following equations 



(78) 
(79) 



The solution of these equations is expected to be unique for J convex and cjg > 0. 
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Next consider the derivative of ^{s) with respect to s, which is our main object of interest, cf. 



Eq. (41). By differentiating Eq. (69) and inverting the order of derivative and hmit, we get 

p 

imin \cd{u,e-eo-Tj:-^/^Z) - {9,u) + ^ 

•P 



d^ 

ds 



1 ^ 

{s = 0)= hm -Emin \cd{u,9- 9o - tJ:-^^^Z) - (e,u) + Y g{ui,eo,i,i^-%)} , (80) 



where 6 is the minimizer at s = 0, i.e., 9 = '^({Oq + rS ^^'^Z), and Ci''"^ solve Eqs. (78), (79). At this 
point we choose d = 1/C,. Minimizing over u (recah that g{x, y, z) = maXueR[ux — g{u, y, z)]), we get 



^(s = 0) = - hm ^Eg(0o,* + t(S-i/2z),, 9o,i, {^-%) . 
as p~^co p 



(81) 



Comparing with Eq. (41), this proves the claim that the standard distributional limit does indeed 
hold. 

Notice that is given by Eq. (78) that, for d = 1/C does indeed coincide with the claimed 
Eq. (28). Finally consider the scale parameter d = d{p) defined by Eq. (24). We claim that 

1 



lim d(p) = d 



(82) 



Consider, for the sake of simplicity, the case that J is differentiable and strictly convex (the general 
case can be obtained as a limit). Then the minimum condition of the proximal operator (26) reads 

9 = 7lb{y) ^ b^{y-9) = VJ{9). (83) 

Differentiating with respect to 9, and denoting by Drjj, the Jacobian of we get Dr]i,{y) = (I + 
fe-iS-^V^ J(6'))-i and hence 



E2(a,6) = lim -ETracef(l + 6-^S-^/2y2j^^^5.-i/2^-i1 



p— >oo p 

r]b{9o + ^^'^/^ Z). 



Hence, combining Eqs. (79) and (|84|) implies that d = C ^ satisfies 

1 
d 



1 = I + lim -ETracef(l +dS-^/2y2j/m5^-i/2N-i1 
^ ^ r?i/a(^0 + rS-i/2z). 



(84) 
(85) 

(86) 
(87) 



The claim (82) follows by comparing this with Eq. (24), and noting that, by the above 9 is indeed 
asymptotically distributed as the estimator (23). 
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C Simulation results 



Consider the setup discussed in Section |3.1.1[ We compute type I error and the average power for 



SDL-TEST and Biihlmann's method [Buhl2j for 10 reahzations of each configuration. The experiment 
results for the case of identity covariance (E = Ipxp) are summarized in Tables [S] and |9j Table [S] and 
Table [9] respectively correspond to significance levels a = 0.05, and a = 0.025. The results are also 



compared with the asymptotic results given in Theorem 3.4 



The results for the case of circulant covariance matrix are summarized in Tables [Tol and [TTl 
Table 10 and Table [TT| respectively correspond to significance levels a = 0.05, and a = 0.025. The 



results are also compared with the lower bound given in Theorem 3.7 



For each configuration, the tables contain the means and the standard deviations of type I errors 
and the powers across 10 realizations. A quadruple such as (1000, 600, 50, 0.1) denotes the values of 
p = 1000, n = 600, So = 50, = 0.1. 
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Table 8: Comparison between our procedure (cf. Table [l]), Biihlmann's method [Biihl2| and the asymptotic 



bound for our procedure (cf. 
a = 0.05 and S = I„ 



Theorem 3.4) on the setup described in Section 3.1.1 The significance level is 
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A e-ifmr>tri+ir RnnnH H 000 fiOO 100 1 ^"l 

111^ LOLIC XJOLlliU 1 lUUU, UUU, XUU, U. 1 (J ( 


09^ 

U.UZO 


N A 

IN 


u.ooyy 


N A 


uur testing x^roceuure (^iuuu, DUU, ou, u.ioj 


U.Uzo ( 4 


U.UU04D 


n 7f^Ann 

U. ( oOUU 


n n77n*^ 

U.U I ( UO 


TiiiVii wT^*i*i 'c- w^^+Vi^^ /"I nnn f^nn r^n n i 
jDunimann s metnoci ^iuuu, ouu, ou, u.iJjj 


n nnyvo 
U.UUo / y 


n nnoQO 
U.UUzoz 


n oocnn 
U.zzoUU 


n nfinf^o 
U.UOUoz 


Aei^mn+ntif- Rminrl ("lOOO fiOO '^0 1 

xA.E5y llljJLO Llt^ XJOLlllU I lUUU, UUU, OU, U.XO 1 


09^ 

U.UZO 


N A 


771 07 

U. ( 1 Lu 1 


N A 


On*. +*-»c^+i**rr T> r>r^ i^^H n /'l OOn (\C\r\ 9^ n 

uur testing x^roceuure (^iuuu, DUU, zo, u.ioj 


n nQ9A9 
U.Uozoz 


U.UUyzo 


n 709nn 
u. / yzuu 


n n/i 1 Qi 

U.U4ioi 


rsunimann s metnoci (^iuuu, ouu, zo, u.it)j 


U.UU / oy 


n nnooQ 
U.UUzzo 


n OQcnn 
U.zooUU 


n n77on 
U.U / 1 zy 


Aeirmr>tntir RminH HOOO fiOO 9'^ 1 

xA.E5y llljJLO Lie XJOLIIIU I XUUU, UUU, ZU, U.-LU 1 


09^ 

U.UZO 


N A 


8zLQ1 9 


N A 


uur testing x^rocedure (^iuuu, oUU, ou, U.iOj 


n noni r 
U.UzyiD 


n nnoo/i 
U.UUyz4 


n Q^^nnn 
U.oOUUU 


n ncQon 
U.UoooU 


till Vii w* r^***! 'c- vv*^+Vi^^ /"I nnn Qnn r^n n 1 
rsunimann s metnoci ^iuuu, oUU, ou, u.iSj 


n nn/inn 
U.UU4UU 


n nnof^7 
U.UUzo ( 


n 1 ncnn 
U.iUoUU 


n n^/1 oo 
U.Uo4oz 


Atsvmntntif RnnnH ("lOOO "^00 '^n n 

Xik5^ lllJJUJ Lll^ J_> l-J l.llHl I ±UUU, OUU, OU, U.-LO J 


09^ 

U.UZiO 


NA 


99001 

U.ZZiUUX 


NA 

IN iA. 


uur testing x^roceciure (^iuuu, oUU, zo, U.ioj 


n nQnnp; 
U.UoUUO 


n nnco/i 
U.UUoy4 


n /1 0/1 nn 
U.4z4UU 


U.Uooo4 


ii> 1* ^ 1^ 1 *» n « » /"I nnn onn n i r\ 

rsunimann s metnoci ^iuuu, oUU, zo, u.ioj 


n nn /i no 
U.UU4yz 


n nnoof^ 
U.UUzzo 


n o 1 fjn/^ 
U.zioUU 


n n^j 1 n 
U.UOoiU 


Atsvmnfntir RnnnH ("1000 "^00 0^ Ci 

xio^ lllJJUJ Llt^ XJULlllU I ±UUU, OUU, ^O, U.XO J 


09^1 

U.UilO 


NA 

IN 


40907 

U.4:UZU ( 


N A 

IN 


+*-»c-+?v*r»- "D*.*-*^-.^^iiv*-» f'onnn ^?nn inn n 1^ 
uur testing x^rocedure ^zuuu, ouu, iuu, u.ij 


n nQn^n 
U.UoU ( y 


U.UUOOo 


n QQnnn 
U.ooUUU 


U.UoUoo 


"D 7^1^ 1 1^ n » » ^^^4-1^^^ /"onnn ^?nn inn n i\ 

rsunimann s metnoci (^zuuu, oUU, iUU, u.ij 


n nn /I Q /I 
U.UU4(34 


n nni 7n 
U.UUi ( y 


nil onn 
U.iizUU 


n n o 1 c 
U.UoDiO 


Atsvmntntir RnnnH ("9000 fiOO 100 1 

Xik5^ lll^HJ Lll^ XJULlllU I ZiUUU, UUU, XUU, U.X J 


09^1 

U.UZiO 


NA 

IN 


U. l-UOoO 


NA 

IN iA. 


On*. +^ir-4-5**.T o ^1 1 1 v.^ /onnn ^^nn c^n n i A 
uur testing x^roceciure (^zuuu, OUU, ou, u.ij 


U.UzOoO 


n nn/1 q i 
U.UU4S1 


n /1 1 onn 
U.4izUU 


n nf?i n7 
U.UOiy ( 


TD V.1 «^ « ' *^^i-\-^^A /onnn tinn c;n n iA 

rsunimann s metnoci (zuuu, bUU, oU, u.ij 


n nnfjf!o 
U.UUDOz 


n nnnoQ 

u.uuuyo 


n onf;nn 
U.zUoUU 


U.Uo4U0 


A «A^mr>tntin RnnnH /'9000 fiOO ^0 1 
.r\.o^ 111^ LOLIU XJOLLllU I ZUUU, UUU, OU, U.l J 


09^1 

U.UZO 


N A 

IN 


u.ooouo 


N A 


o*i*> +r^c?+i*^rr "P*.r^r>i-»ri * * *«^ /'9nnn f^nn 9n n 1 ^ 
uur testing x^roceuure ^^zuuu, ouu, zu, u.ij 


n noAOf^ 

U.UzDZD 


n nnf^i n 

U.UUoiU 


n /i7^nn 

U.4 / OUU 


n 1 n^^n7 

U.iUDU ( 


TDJi 1^1 «^ « ' ^^^i-\^^A /onnn £^nn on n 1^ 

rsunimann s metnoci (zuuu, bUU, zu, u.ij 


n nnQQQ 
U.UUooo 


n nnoQO 
U.UUzoz 


n OQc;nn 
U.zooUU 


n nQ 1 CO 
U.Uoioz 


Asymptotic Bound (2000,600,20,0.1) 


0.025 


NA 


47698 

u»^i Uiy^ 


NA 


Our testing Procedure (2000,600,100,0.15) 


0.02484 


0.00691 


0.52700 


0.09522 


Biihlmann's method (2000,600,100,0.15) 


0.00311 


0.00154 


0.22500 


0.04007 


Asymptotic Bound (2000,600,100,0.15) 


0.025 


NA 


0.43511 


NA 


Our testing Procedure (2000,600,20,0.15) 


0.03116 


0.01304 


0.81500 


0.09443 


Biihlmann's method (2000,600,20,0.15) 


0.00727 


0.00131 


0.54500 


0.09560 


Asymptotic Bound (2000,600,20,0.15) 


0.025 


NA 


0.84963 


NA 



Table 9: Comparison between our procedure (cf. Table [l]), Biihlmann's method [Biihl2| and the asymptotic 



bound for our procedure (cf. 
a = 0.025 and S = I„ 



Theorem 3.4) on the setup described in Section 3.1.1 The significance level is 
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iVletnoa 


Type I err 


Type I err 


Avg. power 


Avg. power 




1 ilicd-ll 1 








suij-test (^iuuu, ouu, iuu, u.ij 


A A(^7QQ 

U.UD ( 66 


A A1 70A 


A /1CQAA 

U.4ooUU 


U.Uo4oo 


rSunlmann s metnou ^iUUU, oUU, iUU, U.ij 


U.UUooo 


A A A /I 1 

U.UU4io 


All AA^^ 
U.iiUUU 


A AOQOQ 

U.Uzozo 


T.nwpr hnnnH (^C\(^0 fiDD 1 Dfl D 1 


0^^ 
u.uo 


NA 


u.^ouoo 


04540 

U.U4:04:U 


&ULi-test (^lUUU, DUU, oU, U.i ) 


U.U4yDo 


A AA007 

u.uuyy ( 


A p;acaa 
U.oUoUU 


A AC:C07 

U.UOOz / 


jDunlmann s metnocl (iUUU, oUU, oU, U.iJ 


A A 1 /I O 

U.UiD4z 


A AA/1 QA 

u.uu4oy 


A 01 AAA 

U.zlUUU 


A A 1 7QQ 

U.U4Mo 


T,nwi=»r hniiTiH H 000 fiOO ^0 1 


0^^ 
u.uo 


N A 


U.OU ( c/O 


U.U004:0 


SUlj-test (^lUUU, DUU, zO, U.i j 


A Ai^n^fi 
u.uoy i y 


A A1 /( Q 


A p;c;oAA 
U.oozUU 


A AOQflA 

u.uooyu 


jDunlmann s metnoa (iUUU, oUU, zo, U.ij 


A AO /I 01 

U.Uz4zl 


A AAQA/1 

U.UUoU4 


A 00/1AA 

U.ZZ4UU 


A 1 AA1 Q 

U.iUUlo 


T,nwi=»r hnnnH H 000 fiOO 9^ 1"! 


0^^ 
u.uo 


N A 




U.UUl ( u 


0"P^T 4-^c.4- /"lAAA QAA O A 1\ 

oJJLi-test (^iUUU, oUU, zo, u.i j 


A AIXQ07 

U.Uooo ( 


A A1 C\i^ 

U.UiUoo 


A OQAAA 

U.zoUUU 


A r\QA QQ 

U.Uo4^>o 


jDunimann s metnoa (^iUUU, oUU, zo, U.lJ 


A A 1 Q 1 

U.Uio04 


A AA/( 1 Q 


A 1 0/1 A A 

U.iz4UU 


A A 1 /I C\0 

U.U44Uz 


T,owi=»r hnnnH H 000 '^OO 9^ 1 


n 0^^ 
u.uo 


N A 


U.Ol ( ZO 


09^^79 

U.UZO ( z 


CT^T +^0+ f'lAAA QAA 1A A 1^ 

suij-test i^iuuu, oUU, lu, u.i ) 


U.Uoyo 


A A1 /I /I 
U.Ui44 


A Q/IAA 

U.o4UU 


A 1 Q C^A 

U.iooU 


jDunimann s metnoa (^iuuu, oUU, lU, u.ij 


A A1 fiQ 

U.UlDo 


A AA/1 A 
U.UU4U 


A 1 OAA 

U.lzUU 


u.io4y 


T r»wi=>r hnnnH H 000 "^00 10 1"^ 


0^^ 
u.uo 


N A 


U.OZoU ( 


U.U'4:UU1 


GT^T MAAA f^AA 1 AA A 1 1^^ 
l3J_Jlj-test ^lUUU, DUU, lUU, U.lO J 


A A/1700 
U.U4 i LL 


A A1 1 ^1/1 


A 700AA 

U. / zzUU 


A A/lf^QQ 

U.U4DOO 


"C?i'iV.l*^r»-i^n'c- trw^ f^-i-Vi ^ A ( 1 AAA f^AA 1 A A A 1 

jDunimann s metnoa i^iuuu, DUU, luu, u.ioj 


A AAQf;^^ 

U.UUoOD 


A AA1 Cl^ 

u.uuiyo 


Aim A A 

u.iyiuu 


A AQ01 Q 

U.Uozlo 


T nw<3r hnnnH H 000 (^00 100 1^^ 
J-iOWcl UOLlllU. 1 lUUU, UUU, lUU, U.J-O 1 


0^ 
u.uo 


N A 


u.uy^o ( 


U.UOOOZ 


GT^T ( 1 AAA fiAA c;A A 1 
SUlj-test l^iUUU, DUU, OU, U. loj 


u.uoo i y 


A A1 O^^O 
U.UizDZ 


A Q1 /lAA 

U.oi4UU 


U.U /DU4 


rsunimann s metnoa (^luuu, DUU, ou, u.ioj 


A A1 AAr; 

U.UiUyo 


A AAQ^O 
U.UUoOz 


A Q/IAAA 

U.04UUU 


U.Uo i oo 


T nwf^r hnnnH HOOO fiOO "^0 1"^^ 
J-jOWcI UUU.11L1 I lUUU, UUU, OU, U.-LO ) 


0^ 
u.uo 


N A 

IN r\ 


U.O^UlO 


O^^^l 

U.UOOlU 


CT^T /'I AAA fiAA Oc: A 1 
oUlj-test l^iUUU, DUU, zO, U. loj 


U.Uoo 1 4 


A A1 C/l A 
U.Uio4U 


U.OODUU 


A Af^QI A 

U.UDoiU 


ounimann s metnoa (^luuu, DUU, zo, u.ioj 


A A1 O^^A 

U.UiyDy 


U.UUooo 


U.4DoUU 


A ACA1 1 

U.UoUli 


T nwp^r hnnnH T 1 000 fiOO 9^ 1"^^ 
J-jUWcI \jyj llllvl I XUUU, UUU, ZO, U.-LO J 


0^ 
u.uo 


N A 

IN r\ 


U.OUOUZ 


09997 

U.UZZZ ( 


GT^T / 1 AAA QAA IXA A 1 

aJJlj-test i^iUUU, oUU, oU, U.io ) 


A ACC /1 1 1 

U.Uo411 


A A1 O/l 7 

u.uiy4 ( 


A /1Q0AA 

U.4ooUU 


A AO/IAO 

u.uy4Uz 


rsunimann s metnoa (^luuu, oUU, ou, u.ioj 


A A1 A1 1 

U.UiUll 


U.UUoDz 


A OAOAA 

U.zUzUU 


A Af^AOn 

u.UoUzy 


T.owpr hnnnH TlOOO '^OO "^0 1 

XJUWt;! y.J yJ l.lllvi I XUUU, OUU, OU, U.-LO ) 


0^^ 
u.uo 


NA 




U.UOc/OO 


GT^T /'I AAA QAA Oc: A 1 

oUlj-test l^iUUU, oUU, ZO, U. loj 


A Ai^Of^O 

U.UozDz 


A A1 C ^/l 
U.Uio04 


A c:Qi?AA 

U.ooDUU 


A AOA/1 A 

U.UoU44 


ounimann s metnoa (^iuuu, oUU, zo, u.ioj 


U.Uio44 


A AAOc;o 

U.UUzoo 


A QQOAA 

U.oozUU 


A AQOQA 

U.UozoU 


T.nwpr hnnnH TlOOO "^00 9^ 1 

XJtJWtil ILII_IIJ.11U I -LUUU, OUU, ZO, U.-LO J 


0^^ 
u.uo 


NA 


U.OU -L 0*0 


u.uo t oo 


GT^T +<-»ci+ /'OAAA fJAA 1 AA A lA 
oUlj-test l^zUUU, DUU, lUU, U.I J 


U.UozDo 


A A1 1 Ac; 
U.UiiUO 


A /IQOAA 

u.4oyuu 


A A/I QQQ 

U.U4oo^5 


ounimann s metnoa (^zuuu, DUU, iUU, u.ij 


A A1 oaf; 
U.UizUo 


A AAOC/I 
U.UUZ(54 


A 01 OAA 

U.zizUU 


A A 1 Q AO 

U.U4oyz 


T.nwpr hnnnH r9000 fiOO 100 1^ 


0^ 
u.uu 


NA 


41 "^QS 


0^^494 


SULi-teSt (^zUUU, DUU, OU, U. i j 


U.UiDoOD 


U.UUOoi 


n c;acaa 
U.oUBUU 


U.UooOU 


jDUiiimann s metnoa (^zuuu, ouu, oU, U.ij 


fi A 1 y /I /I 


A AAOOr^ 
U.UUzzo 


U.zDUUU 


U.Uo Mi 


T.owpr hniinH ('9(100 fiOO ^0 1"; 


u .uu 


N A 

IN 




u.uzuzu 


SULi-teSt ^ZUUU, DUU, zU, U. i j 


u.U4yoo 


A AAOO/1 
U.UUoz4 


U.0( OUU 


U.ioooO 


rSunlmann s metnoa (zUUU, oUU, zU, U.ij 


A A1 fJ70 

U.UiD / z 


A AAOQO 

U.UUzoz 


U.oooUU 


A ACA^^A 

u.uoyou 


Lower bound (2000,600,20,0.1) 


0.05 


NA 


0.58947 


0.04472 


SDL-test (2000,600,100,0.15) 


0.05284 


0.00949 


0.61600 


0.06802 


Biihlmann's method (2000,600,100,0.15) 


0.00895 


0.00272 


0.31800 


0.04131 


Lower bound (2000,600,100,0.15) 


0.05 


NA 


0.64924 


0.05312 


SDL-test (2000,600,20,0.15) 


0.05318 


0.00871 


0.85500 


0.11891 


Buhlmann's method (2000,600,20,0.15) 


0.01838 


0.00305 


0.68000 


0.12517 


Lower bound (2000,600,20,0.15) 


0.05 


NA 


0.87988 


0.03708 



Table 10: Comparison between SDL-test, Buhlmann's method |Buhl2| and the lower bound for the statis- 
tical power of SDL-TEST (cf. Theorem 3.7) on the setup described in Section 3.2.1 The significance level is 
a — 0.05 and S is the described circulant matrix. 
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iVletnoa 


Type I err 


Type I err 


Avg. power 


Avg. power 




1 ilicdll 1 






I h LU. J 


suij-test (^iuuu, ouu, iuu, u.ij 


A AQACO 

U.UrjUoy 


A AAOCO 


A QftQAA 

U.oDoUU 


A AQCAO 

U.UooUz 


rSunlmann s metnou ^iUUU, oUU, iUU, U.ij 


A AAQ 1 1 

U.UUoll 


A AA1 1 

U.UUiio 


A A 7 /I A A 
U.U / 4UU 


U.U14oU 


T.nwpr hnnnH (^C\(^0 fiDD 1 Dfl D 1 


09^1 
u.uzo 


NA 


U .OOU ( C/ 


U.UOOiyO 


&ULi-test (^lUUU, DUU, oU, U.i ) 


A AQ/1 1 1 


A A1 QCO 

U.Uiooz 


r\ A A CAA 

U.44oUU 


A A770n 

U.U/ /zy 


jDunlmann s metnocl (iUUU, oUU, oU, U.iJ 


A AA7/I V 

U.UU /4 / 


A AAOAC; 

u.uuzy.D 


A 1 1 /I AA 

U.ii4UU 


A AQ07Q 

U.Uoz { 6 


T,nwi=»r hniiTiH H 000 fiOO ^0 1 


09^^ 
u.uzo 


K A 


40R14 

U.^zUO-L'rt 


0'^4'^7 
u.uo'yio ( 


SUlj-test (^lUUU, DUU, zO, U.i j 


A AQA1 

U.UoUlO 


A A1 n^^o 
U.UiUDz 


r\ A A A AA 

U.444UU 


A 1 A7/1 1 

U.iU { 41 


jDunlmann s metnoa (iUUU, oUU, zo, U.ij 


A AAAfJ/1 

u.uuyD4 


A AA07A 

u.uuz / y 


A 1 /I AAA 
U.i4UUU 


A A /I QOA 

U.U4ozU 


T,nwi=»r hnnnH H 000 fiOO 9^ 1"! 


09^^ 
u.uzo 


N A 


4'^1 09 

U.4:0 -LUZ 


0491 

U.U4:Z ±U 


0"P^T 4-^c.4- /"lAAA QAA O A 1\ 

oJJLi-test (^iUUU, oUU, zo, u.i j 


A AQ70Q 


A A1 QAA 

u.uioyu 


A OQAAA 

U.zoUUU 


A AQAAA 

U.UoUUU 


JDunlmann s metnoa (iUUU, oUU, zo, U.iJ 


A aat^af; 
U.UUDUo 


A AA071 

U.UUz / i 


A A 0/1 A A 

U.Uo4UU 


A AfiAAfi 

U.UDUyO 


T,owi=»r hnnnH H 000 '^OO 9^ 1 


09^^ 
u.uzo 


N A 


9^^709 

U.ZO ( uz 


0^'^Q9 

U.UOOc/Z 


CT^T +^0+ f'lAnn QAA 1A A 1^ 

suij-test i^iuuu, oUU, lu, U.i ) 


A AOOAO 

u.uzyuy 


A A1 non 
u.uiuzy 


A 1 OAAA 

u.iyuuu 


All 070 

u.iiy ( Z 


jDunimann s metnoa (^iuuu, oUU, lU, u.ij 


A AA(^A/^ 

U.UUDUD 


A AA1 VO 

U.UUi i z 


All AAA 
U.ilUUU 


U.Uo i OD 


T r»wi=>r hnnnH H 000 "^00 10 1"^ 


09^^ 
u.uzo 


N A 


9A'\A^ 

U.Z^04:0 


074'?'^ 

U.U ( ^OO 


GT^T MAAA f^AA 1 AA A 1 1^^ 
l3J_Jlj-test ^lUUU, DUU, lUU, U.lO J 


A A07AAA 

U.Uz i^UUU 


U.UUDoiaU 


A (^O/^AAA 

U.DzoUUU 


U.UD0^504 


"C?i'iV.l*^r»-i^n'c- trw^ f^-i-Vi ^ A ( 1 AAA f^AA 1 A A A 1 

jDunimann s metnoa i^iuuu, DUU, luu, u.ioj 


U.UUUuD / 


A nAA777 

U.UUU i I i 


All OAAA 

U.llzUUU 


A AC70/1 

U.Uo i Z4 


T nw<3r hnnnH H 000 (^00 100 1 ^"l 
J-iOWcl UOLlllU. 1 lUUU, UUU, lUU, U.J-O 1 


09^^ 
u.uzo 


N A 


u.uoyoz 


0(^079 
U.UUU ( z 


GT^T ( 1 AAA fiAA c;A A 1 
SUlj-test l^iUUU, DUU, OU, U. loj 


A AOn^fl 

u.uzy 


u.uuyo ( 


A 71 CAA 

U. ( ioUU 


A AQCO/1 

U.Uooz4 


rsunimann s metnoa (^luuu, DUU, ou, u.ioj 


A AAQO^i 

U.UUozD 


A AA07'1 
U.UUz 1 4 


A 1 AAA 

U.zlUUU 


A (^c.A Q7 

U.Uo4o i 


T nwf^r hnnnH HOOO fiOO "^0 1"^^ 
J-jOWcI UUU.11L1 I lUUU, UUU, OU, U.-LO ) 


09^ 
u.uzo 


N A 


U. ( OU ( u 


0^0^7 
u.uoyo ( 


CT^T /'I AAA fiAA Oc: A 1 
oUlj-test l^iUUU, DUU, zO, U. loj 


A AQO/^O 

U.Ur5zDz 


U.UUoDD 


A 7I^^^AA 

U. ( ODUU 


A 1 0/1 on 
u.iZ4zy 


ounimann s metnoa (^luuu, DUU, zo, u.ioj 


A A1 A77 

U.UiUi^ i 


A AAQ/(^^ 
U.UUo4D 


A QA/IAA 

U.OU4UU 


A ACOfiO 

U.UoZDz 


T nwp^r hnnnH T 1 000 fiOO 9^ 1"^^ 
J-jUWcI \jyj llllvl I XUUU, UUU, ZO, U.-LO J 


09^ 
u.uzo 


N A 

IN r\. 


ROOzLzl 

U.OUU^'4: 


u.uo^oo 


GT^T / 1 AAA QAA IXA A 1 

aJJlj-test i^iUUU, oUU, oU, U.io ) 


A AQ /lf?Q 


A A1 /I 7Q 


A QOOAA 

u.oyzuu 


A 1 1 /I 7Q 


rsunimann s metnoa (^luuu, oUU, ou, u.ioj 


A AAQfiQ 

U.UUoDo 


A AAOQA 

u.uuzoy 


A 1 r^AAA 

U.ioUUU 


A A /j 1 Q 7 
U.U410 ( 


T.owpr hnnnH TlOOO '^OO "^0 1 

XJUWt;! y.J yJ l.lllvi I XUUU, OUU, OU, U.-LO ) 


09^ 
u.uzo 


NA 


u.ouuo^ 


U.Ut:0-LO 


GT^T /'I AAA QAA Oc: A 1 
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Lower bound (2000,600,20,0.1) 


0.025 


NA 


0.47549 


0.06233 


SDL-test (2000,600,100,0.15) 


0.026737 


0.009541 


0.528000 


0.062681 


Biihlmann's method (2000,600,100,0.15) 


0.002947 


0.000867 


0.236000 


0.035653 


Lower bound (2000,600,100,0.15) 


0.025 


NA 
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SDL-test (2000,600,20,0.15) 
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0.79000 


0.12202 


Buhlmann's method (2000,600,20,0.15) 


0.00732 
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Lower bound (2000,600,20,0.15) 


0.025 


NA 
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Table 11: Comparison between SDL-test, Buhlmann's method |Buhl2| and the lower bound for the statis- 
tical power of SDL-TEST (cf. Theorem 3.7) on the setup described in Section 3.2.1 The significance level is 
a — 0.025 and S is the described circulant matrix. 
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D Alternative hypothesis testing procedure 

SDL-TEST, described in Table [3j needs to compute an estimate of the covariance matrix E. Here, 
we discuss another hypothesis testing procedure which leverages on a slightly different form of the 
standard distributional limit, cf. Definition 3.5 This procedure only requires bounds on S that can 
be estimated from the data. Furthermore, we establish a connection with the hypothesis testing 
procedure of |Buhl2| . We will describe this alternative procedure synthetically since it is not the 
main focus of the paper. 

By Definition 3.5, if a sequence of instances S = {{T,{p),9{p),n{p), a{p))}p^fq has standard dis- 
tributional limit, then with probability one the empirical distribution of {(^" — 
converges weakly to N(0,r^). We make a somewhat different assumption that is also supported 
by the statistical physics arguments of Appendix [B} The two assumptions coincide in the case of 
standard Gaussian designs. 

In order to motivate the new assumption, notice that the standard distributional limit is consistent 
with 0" — ^0 being approximately N(0,t^S^^). If this holds, then 

S(r-eo) = ^(O-eo) + -XT(y-X^) « N(0,r2s). (88) 

n 

This motivates the definition of Oi = r-i(i;i,j)-i/2p(5)« _ q^^^.^ ^j^g^ assume that the empirical 
distribution of {{6o,i,0i, D)}i^[pj converges weakly to {Qo, Z, D), with Z ~ N(0, 1) independent of 

Under the null- hypothesis -ffo,«) we get 

= T-i(S,,,)"'/'P(^- ^o) + -XT(y - X^)], 

n 

= r-\^i^,fl% + r-i(Si,i)-'/'[-XT(y - X^)], + r-i(S,,i)"'/'Si,^*(^~i " ^o,~i), 

n 

where Sj^^j denotes the vector (Sjj)jy:j. Similarly 6r^i and O^^r^i respectively denote the vectors 
{6j)j-ti and {6oj)j^i. Therefore, 

r-i(Si,i)i/2^i + T-HSi,i)-i/2[^x"r(y - X^)], = 9^ - r-i(Si,i)-i/2Si,^i(L. - ^o,~^). 

Following the philosophy of jBuhl2j . the key step in obtaining a p- value for testing Hq^i is to find 
constants Aj, such that asymptotically 

EE r'i(Si,i)i/2^. + r-i(Si,i)-i/2[^xT(y - xe)]i ^ \Z\ + A^, (89) 

n 

where Z ~ N(0,1), and ^ denotes "stochastically smaller than or equal to". Then, we can define 
the p-value for the two-sided alternative as 

Pi = 2{l-m\C^\-A,) + )). 

Control of type I errors then follows immediately from the construction of p- values: 

limsupP(Pj < a) < a, if -ffo,« holds. 
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In order to define the constant Aj, we use analogous argument to the one in [Biihl2] : 

|r-i(S,,,)-i/2Si,^^(^^. _ < max|S„-| (r-iS"//') ||^- 0o||i. 

Recall that 9 = 6{X) is the solution of the Lasso with regularization parameter A. Due to the result 
of |BRT091 IvdGBOQ] . using A = 4(Ty^(i2 + 21og(p))/n, the following holds with probability at least 
1 - 2e-*V2: 

ll^-^olli <4Aso/</'o, (90) 

where sq is the sparsity (number of active parameters) and (j)o is the compatibility constant. Assuming 
for simplicity Sj^j = 1 (which can be ensured by normalizing the columns of X), we can define 

Aj = — ^max Lj ,- . 

Therefore, this procedure only requires to bound the off-diagonal entries of S, i.e., maxjyj l^jjl- It 
is straightforward to bound this quantity using the empirical covariance, S = (l/n)X"'"X. 

Claim D.l. Consider Gaussian design matrix X G M"^^, whose rows are drawn independently from 
N(0, S). For any fixed i G [p], the following holds true with probability at least 1 — 6p~^^^ 



Proof. Write 



max|Ejj'| < max|Sjj'| + 20a/ . (91) 

ov^ 2~T~ 1 ||~ I ~ ||2 1 ||~ ||2 1 ||~ ||2 

n •' n n n 

Note that + ~ 2(1 + Sjj)Z„, and ||5;j|p ~ Z„, where is a chi-squared random 

variable with n degrees of freedom. 
Hence, for any c we have 

P(Sij < Si J - c) = P(^^x7xj < 2Sij - 2c 



2c 

y 

, 1 „~ „9 2c\ /I „n 2c 

+ F - Xif > 1 + — +P - xJP > 1 + 



<P(^^||xi + £jf < 2(Sij + l) 



n " "' '37' Vn " ' '3 

< P(Z„ < n(l - ^)) + 2P(Z„ > n(l + ^)) , (92) 



6V V " - ^ 3 

where in the last step, we used Sjj- < Sj^j = 1. 
Similarly, 

F(Eij > Si J + c) = P(^£7^j > 2Sij + 2c 

<P('-Pi + xjf >2(Sij + l) + ^ 



+ P( -Pif < 1 - ^) +P(-P,|P < 1 - ^ 
n 3 / Vn 3 

2c 



< P( Z„ > n(l + -) ) + 2P( Z„ < n(l - -) ) . (93) 
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Let Fn{x) = F{Zn > x). Then, combining Eqs. (92), (93), we obtain 

F{\%j - > c) < {1 - F„(n(l - ^)) + FMl + ^))} + 2{1 - F„(n(l - y)) + F,(n(l + |))} 

We upper bound the above probabihty using the concentration of a chi-squared random variable 
around its mean. Indeed, applying Chernoff tail bound for Z„ (similar to the one in Corollary 2.7) 
and taking c = 20^ylogp/n, we have that for logp/n < 0.01, 

\hj - Si J I > 20Vlogp/n) < 6p-^/3 

Using union bound for j £ [p], j ^ i, we get 



pf max|Sij - Sijl < 20^/logp/n) > 1 - 6p"^/^ 
The result follows from the inequality 
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